未验证 提交 74be6033 编写于 作者: N neonhuang 提交者: GitHub

add multi lang ocr (#1702)

上级 d53f6412
......@@ -231,3 +231,4 @@ We welcome you to contribute code to PaddleHub, and thank you for your feedback.
* Many thanks to [zl1271](https://github.com/zl1271) for fixing serving docs typo
* Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of UGATIT and deoldify models in Hugging Face spaces
* Many thanks to [itegel](https://github.com/itegel) for fixing quick start docs typo
* Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of Photo2Cartoon model in Hugging Face spaces
......@@ -247,3 +247,4 @@ print(results)
* 非常感谢[zl1271](https://github.com/zl1271)修复了serving文档中的错别字
* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了UGATIT和deoldify模型的web demo
* 非常感谢[itegel](https://github.com/itegel)修复了快速开始文档中的错别字
* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了Photo2Cartoon模型的web demo
......@@ -50,6 +50,8 @@
**UGATIT Selfie2anime Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/U-GAT-IT-selfie2anime)
**Photo2Cartoon Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/photo2cartoon)
### Object Detection
- Pedestrian detection, vehicle detection, and more industrial-grade ultra-large-scale pretrained models are provided.
......
# arabic_ocr_db_crnn_mobile
|模型名称|arabic_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- arabic_ocr_db_crnn_mobile Module用于识别图片当中的阿拉伯文字,包括阿拉伯文、波斯文、维吾尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的阿拉伯文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别阿拉伯文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install arabic_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="arabic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造ArabicOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m arabic_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/arabic_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="arabic_ocr_db_crnn_mobile",
version="1.1.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class ArabicOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="arabic",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# chinese_cht_ocr_db_crnn_mobile
|模型名称|chinese_cht_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- chinese_cht_ocr_db_crnn_mobile Module用于识别图片当中的繁体中文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的繁体中文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别繁体中文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install chinese_cht_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="chinese_cht_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造ChineseChtOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m chinese_cht_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_cht_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="chinese_cht_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class ChineseChtOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="chinese_cht",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
# cyrillic_ocr_db_crnn_mobile
|模型名称|cyrillic_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- cyrillic_ocr_db_crnn_mobile Module用于识别图片当中的斯拉夫文,包括俄罗斯文、塞尔维亚文、白俄罗斯文、保加利亚文、乌克兰文、蒙古文、阿迪赫文、阿瓦尔文、达尔瓦文、因古什文、拉克文、莱兹甘文、塔巴萨兰文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的斯拉夫文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别斯拉夫文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install cyrillic_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="cyrillic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造CyrillicOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m cyrillic_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/cyrillic_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="cyrillic_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class CyrillicOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="cyrillic",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# devanagari_ocr_db_crnn_mobile
|模型名称|devanagari_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- devanagari_ocr_db_crnn_mobile Module用于识别图片当中的梵文,包括印地文、马拉地文、尼泊尔文、比尔哈文、迈蒂利文、昂加文、孟加拉文、摩揭陀文、那格浦尔文、尼瓦尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的梵文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别梵文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install devanagari_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="devanagari_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造DevanagariOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m devanagari_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/devanagari_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="devanagari_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class DevanagariOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="devanagari",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# french_ocr_db_crnn_mobile
|模型名称|french_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- french_ocr_db_crnn_mobile Module用于识别图片当中的法文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的法文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别法文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install french_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="french_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造FrechOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m french_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/french_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="french_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class FrechOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="fr",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
!
"
$
%
&
'
(
)
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
>
?
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
]
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
£
§
­
²
´
µ
·
º
¼
½
¿
À
Á
Ä
Å
Ç
É
Í
Ï
Ô
Ö
Ø
Ù
Ü
ß
à
á
â
ã
ä
å
æ
ç
è
é
ê
ë
í
ï
ñ
ò
ó
ô
ö
ø
ù
ú
û
ü
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import string
class CharacterOps(object):
""" Convert between text-label and text-index """
def __init__(self, config):
self.character_type = config['character_type']
self.loss_type = config['loss_type']
self.max_text_len = config['max_text_length']
if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
elif self.character_type in [
"ch", 'japan', 'korean', 'french', 'german'
]:
character_dict_path = config['character_dict_path']
add_space = False
if 'use_space_char' in config:
add_space = config['use_space_char']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str += line
if add_space:
self.character_str += " "
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
else:
self.character_str = None
assert self.character_str is not None, \
"Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos"
self.end_str = "eos"
if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character
elif self.loss_type == "srn":
dict_character = dict_character + [self.beg_str, self.end_str]
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def encode(self, text):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if self.character_type == "en":
text = text.lower()
text_list = []
for char in text:
if char not in self.dict:
continue
text_list.append(self.dict[char])
text = np.array(text_list)
return text
def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """
char_list = []
char_num = self.get_char_num()
if self.loss_type == "attention":
beg_idx = self.get_beg_end_flag_idx("beg")
end_idx = self.get_beg_end_flag_idx("end")
ignored_tokens = [beg_idx, end_idx]
else:
ignored_tokens = [char_num]
for idx in range(len(text_index)):
if text_index[idx] in ignored_tokens:
continue
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[int(text_index[idx])])
text = ''.join(char_list)
return text
def get_char_num(self):
return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end):
if self.loss_type == "attention":
if beg_or_end == "beg":
idx = np.array(self.dict[self.beg_str])
elif beg_or_end == "end":
idx = np.array(self.dict[self.end_str])
else:
assert False, "Unsupport type %s in get_beg_end_flag_idx"\
% beg_or_end
return idx
else:
err = "error in get_beg_end_flag_idx when using the loss %s"\
% (self.loss_type)
assert False, err
def cal_predicts_accuracy(char_ops,
preds,
preds_lod,
labels,
labels_lod,
is_remove_duplicate=False):
acc_num = 0
img_num = 0
for ino in range(len(labels_lod) - 1):
beg_no = preds_lod[ino]
end_no = preds_lod[ino + 1]
preds_text = preds[beg_no:end_no].reshape(-1)
preds_text = char_ops.decode(preds_text, is_remove_duplicate)
beg_no = labels_lod[ino]
end_no = labels_lod[ino + 1]
labels_text = labels[beg_no:end_no].reshape(-1)
labels_text = char_ops.decode(labels_text, is_remove_duplicate)
img_num += 1
if preds_text == labels_text:
acc_num += 1
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def cal_predicts_accuracy_srn(char_ops,
preds,
labels,
max_text_len,
is_debug=False):
acc_num = 0
img_num = 0
char_num = char_ops.get_char_num()
total_len = preds.shape[0]
img_num = int(total_len / max_text_len)
for i in range(img_num):
cur_label = []
cur_pred = []
for j in range(max_text_len):
if labels[j + i * max_text_len] != int(char_num - 1): #0
cur_label.append(labels[j + i * max_text_len][0])
else:
break
for j in range(max_text_len + 1):
if j < len(cur_label) and preds[j + i * max_text_len][
0] != cur_label[j]:
break
elif j == len(cur_label) and j == max_text_len:
acc_num += 1
break
elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(
char_num - 1):
acc_num += 1
break
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def convert_rec_attention_infer_res(preds):
img_num = preds.shape[0]
target_lod = [0]
convert_ids = []
for ino in range(img_num):
end_pos = np.where(preds[ino, :] == 1)[0]
if len(end_pos) <= 1:
text_list = preds[ino, 1:]
else:
text_list = preds[ino, 1:end_pos[1]]
target_lod.append(target_lod[ino] + len(text_list))
convert_ids = convert_ids + list(text_list)
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
def convert_rec_label_to_lod(ori_labels):
img_num = len(ori_labels)
target_lod = [0]
convert_ids = []
for ino in range(img_num):
target_lod.append(target_lod[ino] + len(ori_labels[ino]))
convert_ids = convert_ids + list(ori_labels[ino])
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from PIL import Image, ImageDraw, ImageFont
import base64
import cv2
import numpy as np
def draw_ocr(image,
boxes,
txts,
scores,
font_file,
draw_txt=True,
drop_score=0.5):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score or math.isnan(score):
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if draw_txt:
img = np.array(resize_img(image, input_size=600))
txt_img = text_visual(
txts,
scores,
font_file,
img_h=img.shape[0],
img_w=600,
threshold=drop_score)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img
return image
def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if scores is not None:
assert len(texts) == len(
scores), "The number of txts and corresponding scores must match"
def create_blank_img():
blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
blank_img[:, img_w - 1:] = 0
blank_img = Image.fromarray(blank_img).convert("RGB")
draw_txt = ImageDraw.Draw(blank_img)
return blank_img, draw_txt
blank_img, draw_txt = create_blank_img()
font_size = 20
txt_color = (0, 0, 0)
font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
gap = font_size + 5
txt_img_list = []
count, index = 1, 0
for idx, txt in enumerate(texts):
index += 1
if scores[idx] < threshold or math.isnan(scores[idx]):
index -= 1
continue
first_line = True
while str_count(txt) >= img_w // font_size - 4:
tmp = txt
txt = tmp[:img_w // font_size - 4]
if first_line:
new_txt = str(index) + ': ' + txt
first_line = False
else:
new_txt = ' ' + txt
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
txt = tmp[img_w // font_size - 4:]
if count >= img_h // gap - 1:
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
if first_line:
new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
else:
new_txt = " " + txt + " " + '%.3f' % (scores[idx])
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
# whether add new blank img or not
if count >= img_h // gap - 1 and idx + 1 < len(texts):
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
txt_img_list.append(np.array(blank_img))
if len(txt_img_list) == 1:
blank_img = np.array(txt_img_list[0])
else:
blank_img = np.concatenate(txt_img_list, axis=1)
return np.array(blank_img)
def str_count(s):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import string
count_zh = count_pu = 0
s_len = len(s)
en_dg_count = 0
for c in s:
if c in string.ascii_letters or c.isdigit() or c.isspace():
en_dg_count += 1
elif c.isalpha():
count_zh += 1
else:
count_pu += 1
return s_len - math.ceil(en_dg_count / 2)
def resize_img(img, input_size=600):
img = np.array(img)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(input_size) / float(im_size_max)
im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
return im
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def sorted_boxes(dt_boxes):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import string
class CharacterOps(object):
""" Convert between text-label and text-index """
def __init__(self, config):
self.character_type = config['character_type']
self.loss_type = config['loss_type']
self.max_text_len = config['max_text_length']
if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
elif self.character_type in [
"ch", 'japan', 'korean', 'french', 'german'
]:
character_dict_path = config['character_dict_path']
add_space = False
if 'use_space_char' in config:
add_space = config['use_space_char']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str += line
if add_space:
self.character_str += " "
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
else:
self.character_str = None
assert self.character_str is not None, \
"Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos"
self.end_str = "eos"
if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character
elif self.loss_type == "srn":
dict_character = dict_character + [self.beg_str, self.end_str]
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def encode(self, text):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if self.character_type == "en":
text = text.lower()
text_list = []
for char in text:
if char not in self.dict:
continue
text_list.append(self.dict[char])
text = np.array(text_list)
return text
def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """
char_list = []
char_num = self.get_char_num()
if self.loss_type == "attention":
beg_idx = self.get_beg_end_flag_idx("beg")
end_idx = self.get_beg_end_flag_idx("end")
ignored_tokens = [beg_idx, end_idx]
else:
ignored_tokens = [char_num]
for idx in range(len(text_index)):
if text_index[idx] in ignored_tokens:
continue
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[int(text_index[idx])])
text = ''.join(char_list)
return text
def get_char_num(self):
return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end):
if self.loss_type == "attention":
if beg_or_end == "beg":
idx = np.array(self.dict[self.beg_str])
elif beg_or_end == "end":
idx = np.array(self.dict[self.end_str])
else:
assert False, "Unsupport type %s in get_beg_end_flag_idx"\
% beg_or_end
return idx
else:
err = "error in get_beg_end_flag_idx when using the loss %s"\
% (self.loss_type)
assert False, err
def cal_predicts_accuracy(char_ops,
preds,
preds_lod,
labels,
labels_lod,
is_remove_duplicate=False):
acc_num = 0
img_num = 0
for ino in range(len(labels_lod) - 1):
beg_no = preds_lod[ino]
end_no = preds_lod[ino + 1]
preds_text = preds[beg_no:end_no].reshape(-1)
preds_text = char_ops.decode(preds_text, is_remove_duplicate)
beg_no = labels_lod[ino]
end_no = labels_lod[ino + 1]
labels_text = labels[beg_no:end_no].reshape(-1)
labels_text = char_ops.decode(labels_text, is_remove_duplicate)
img_num += 1
if preds_text == labels_text:
acc_num += 1
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def cal_predicts_accuracy_srn(char_ops,
preds,
labels,
max_text_len,
is_debug=False):
acc_num = 0
img_num = 0
char_num = char_ops.get_char_num()
total_len = preds.shape[0]
img_num = int(total_len / max_text_len)
for i in range(img_num):
cur_label = []
cur_pred = []
for j in range(max_text_len):
if labels[j + i * max_text_len] != int(char_num - 1): #0
cur_label.append(labels[j + i * max_text_len][0])
else:
break
for j in range(max_text_len + 1):
if j < len(cur_label) and preds[j + i * max_text_len][
0] != cur_label[j]:
break
elif j == len(cur_label) and j == max_text_len:
acc_num += 1
break
elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(
char_num - 1):
acc_num += 1
break
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def convert_rec_attention_infer_res(preds):
img_num = preds.shape[0]
target_lod = [0]
convert_ids = []
for ino in range(img_num):
end_pos = np.where(preds[ino, :] == 1)[0]
if len(end_pos) <= 1:
text_list = preds[ino, 1:]
else:
text_list = preds[ino, 1:end_pos[1]]
target_lod.append(target_lod[ino] + len(text_list))
convert_ids = convert_ids + list(text_list)
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
def convert_rec_label_to_lod(ori_labels):
img_num = len(ori_labels)
target_lod = [0]
convert_ids = []
for ino in range(img_num):
target_lod.append(target_lod[ino] + len(ori_labels[ino]))
convert_ids = convert_ids + list(ori_labels[ino])
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from PIL import Image, ImageDraw, ImageFont
import base64
import cv2
import numpy as np
def draw_ocr(image,
boxes,
txts,
scores,
font_file,
draw_txt=True,
drop_score=0.5):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score or math.isnan(score):
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if draw_txt:
img = np.array(resize_img(image, input_size=600))
txt_img = text_visual(
txts,
scores,
font_file,
img_h=img.shape[0],
img_w=600,
threshold=drop_score)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img
return image
def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if scores is not None:
assert len(texts) == len(
scores), "The number of txts and corresponding scores must match"
def create_blank_img():
blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
blank_img[:, img_w - 1:] = 0
blank_img = Image.fromarray(blank_img).convert("RGB")
draw_txt = ImageDraw.Draw(blank_img)
return blank_img, draw_txt
blank_img, draw_txt = create_blank_img()
font_size = 20
txt_color = (0, 0, 0)
font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
gap = font_size + 5
txt_img_list = []
count, index = 1, 0
for idx, txt in enumerate(texts):
index += 1
if scores[idx] < threshold or math.isnan(scores[idx]):
index -= 1
continue
first_line = True
while str_count(txt) >= img_w // font_size - 4:
tmp = txt
txt = tmp[:img_w // font_size - 4]
if first_line:
new_txt = str(index) + ': ' + txt
first_line = False
else:
new_txt = ' ' + txt
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
txt = tmp[img_w // font_size - 4:]
if count >= img_h // gap - 1:
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
if first_line:
new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
else:
new_txt = " " + txt + " " + '%.3f' % (scores[idx])
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
# whether add new blank img or not
if count >= img_h // gap - 1 and idx + 1 < len(texts):
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
txt_img_list.append(np.array(blank_img))
if len(txt_img_list) == 1:
blank_img = np.array(txt_img_list[0])
else:
blank_img = np.concatenate(txt_img_list, axis=1)
return np.array(blank_img)
def str_count(s):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import string
count_zh = count_pu = 0
s_len = len(s)
en_dg_count = 0
for c in s:
if c in string.ascii_letters or c.isdigit() or c.isspace():
en_dg_count += 1
elif c.isalpha():
count_zh += 1
else:
count_pu += 1
return s_len - math.ceil(en_dg_count / 2)
def resize_img(img, input_size=600):
img = np.array(img)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(input_size) / float(im_size_max)
im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
return im
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def sorted_boxes(dt_boxes):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# kannada_ocr_db_crnn_mobile
|模型名称|kannada_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- kannada_ocr_db_crnn_mobile Module用于识别图片当中的卡纳达文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的卡纳达文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别卡纳达文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install kannada_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="kannada_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造KannadaOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m kannada_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/kannada_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="kannada_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class KannadaOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="ka",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# korean_ocr_db_crnn_mobile
|模型名称|korean_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- korean_ocr_db_crnn_mobile Module用于识别图片当中的韩文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的韩文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别韩文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install french_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="korean_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造KoreanOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m korean_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/korean_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="korean_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class KoreanOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="korean",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# latin_ocr_db_crnn_mobile
|模型名称|latin_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- latin_ocr_db_crnn_mobile Module用于识别图片当中的拉丁文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的拉丁文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别拉丁文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install latin_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="latin_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造LatinOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m latin_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/latin_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="latin_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class LatinOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="latin",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# multi_languages_ocr_db_crnn
|模型名称|multi_languages_ocr_db_crnn|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+RCNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-11-24|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/76040149/133097562-d8c9abd1-6c70-4d93-809f-fa4735764836.png" width = "600" hspace='10'/> <br />
</p>
- ### 模型介绍
- multi_languages_ocr_db_crnn Module用于识别图片当中的文字。其基于PaddleOCR模块,检测得到文本框,识别文本框中的文字,再对检测文本框进行角度分类。最终检测算法采用DB(Differentiable Binarization),而识别文字算法则采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。
该Module不仅提供了通用场景下的中英文模型,也提供了[80个语言](#语种缩写)的小语种模型。
<p align="center">
<img src="https://user-images.githubusercontent.com/76040149/133098254-7c642826-d6d7-4dd0-986e-371622337867.png" width = "300" height = "450" hspace='10'/> <br />
</p>
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install multi_languages_ocr_db_crnn
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE"
$ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE" --lang "ch" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="multi_languages_ocr_db_crnn", lang='en', enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- multi_languages_ocr_db_crnn目前支持80个语种,可以通过修改lang参数进行切换,对于英文模型,指定lang=en,具体支持的[语种](#语种缩写)可查看表格。
- ### 3、API
- ```python
def __init__(self,
lang="ch",
det=True, rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造MultiLangOCR对象
- **参数**
- lang(str): 多语言模型选择。默认为中文模型,即lang="ch"。
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m multi_languages_ocr_db_crnn
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/multi_languages_ocr_db_crnn"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
<a name="语种缩写"></a>
## 五、支持语种及缩写
| 语种 | 描述 | 缩写 | | 语种 | 描述 | 缩写 |
| --- | --- | --- | ---|--- | --- | --- |
|中文|chinese and english|ch| |保加利亚文|Bulgarian |bg|
|英文|english|en| |乌克兰文|Ukranian|uk|
|法文|french|fr| |白俄罗斯文|Belarusian|be|
|德文|german|german| |泰卢固文|Telugu |te|
|日文|japan|japan| | 阿巴扎文 | Abaza | abq |
|韩文|korean|korean| |泰米尔文|Tamil |ta|
|中文繁体|chinese traditional |chinese_cht| |南非荷兰文 |Afrikaans |af|
|意大利文| Italian |it| |阿塞拜疆文 |Azerbaijani |az|
|西班牙文|Spanish |es| |波斯尼亚文|Bosnian|bs|
|葡萄牙文| Portuguese|pt| |捷克文|Czech|cs|
|俄罗斯文|Russia|ru| |威尔士文 |Welsh |cy|
|阿拉伯文|Arabic|ar| |丹麦文 |Danish|da|
|印地文|Hindi|hi| |爱沙尼亚文 |Estonian |et|
|维吾尔|Uyghur|ug| |爱尔兰文 |Irish |ga|
|波斯文|Persian|fa| |克罗地亚文|Croatian |hr|
|乌尔都文|Urdu|ur| |匈牙利文|Hungarian |hu|
|塞尔维亚文(latin)| Serbian(latin) |rs_latin| |印尼文|Indonesian|id|
|欧西坦文|Occitan |oc| |冰岛文 |Icelandic|is|
|马拉地文|Marathi|mr| |库尔德文 |Kurdish|ku|
|尼泊尔文|Nepali|ne| |立陶宛文|Lithuanian |lt|
|塞尔维亚文(cyrillic)|Serbian(cyrillic)|rs_cyrillic| |拉脱维亚文 |Latvian |lv|
|毛利文|Maori|mi| | 达尔瓦文|Dargwa |dar|
|马来文 |Malay|ms| | 因古什文|Ingush |inh|
|马耳他文 |Maltese |mt| | 拉克文|Lak |lbe|
|荷兰文 |Dutch |nl| | 莱兹甘文|Lezghian |lez|
|挪威文 |Norwegian |no| |塔巴萨兰文 |Tabassaran |tab|
|波兰文|Polish |pl| | 比尔哈文|Bihari |bh|
| 罗马尼亚文|Romanian |ro| | 迈蒂利文|Maithili |mai|
| 斯洛伐克文|Slovak |sk| | 昂加文|Angika |ang|
| 斯洛文尼亚文|Slovenian |sl| | 孟加拉文|Bhojpuri |bho|
| 阿尔巴尼亚文|Albanian |sq| | 摩揭陀文 |Magahi |mah|
| 瑞典文|Swedish |sv| | 那格浦尔文|Nagpur |sck|
| 西瓦希里文|Swahili |sw| | 尼瓦尔文|Newari |new|
| 塔加洛文|Tagalog |tl| | 保加利亚文 |Goan Konkani|gom|
| 土耳其文|Turkish |tr| | 沙特阿拉伯文|Saudi Arabia|sa|
| 乌兹别克文|Uzbek |uz| | 阿瓦尔文|Avar |ava|
| 越南文|Vietnamese |vi| | 阿瓦尔文|Avar |ava|
| 蒙古文|Mongolian |mn| | 阿迪赫文|Adyghe |ady|
import argparse
import sys
import os
import ast
import paddle
import paddle2onnx
import paddle2onnx as p2o
import paddle.fluid as fluid
from paddleocr import PaddleOCR
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
from .utils import read_images, save_result_image, mkdir
@moduleinfo(
name="multi_languages_ocr_db_crnn",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class MultiLangOCR:
def __init__(self,
lang="ch",
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
lang(str): the selection of languages
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.lang = lang
self.logger = get_logger()
argc = len(sys.argv)
if argc == 1 or argc > 1 and sys.argv[1] == 'serving':
self.det = det
self.rec = rec
self.use_angle_cls = use_angle_cls
self.engine = PaddleOCR(
lang=lang,
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
det_db_box_thresh=box_thresh,
cls_thresh=angle_classification_thresh)
self.det_model_dir = self.engine.text_detector.args.det_model_dir
self.rec_model_dir = self.engine.text_detector.args.rec_model_dir
self.cls_model_dir = self.engine.text_detector.args.cls_model_dir
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
all_results = []
for img in predicted_data:
result = {'save_path': ''}
if img is None:
result['data'] = []
all_results.append(result)
continue
original_image = img.copy()
rec_results = self.engine.ocr(img, det=self.det, rec=self.rec, cls=self.use_angle_cls)
rec_res_final = []
for line in rec_results:
if self.det and self.rec:
boxes = line[0]
text, score = line[1]
rec_res_final.append({'text': text, 'confidence': float(score), 'text_box_position': boxes})
elif self.det and not self.rec:
boxes = line
rec_res_final.append({'text_box_position': boxes})
else:
if self.use_angle_cls and not self.rec:
orientation, score = line
rec_res_final.append({'orientation': orientation, 'score': float(score)})
else:
text, score = line
rec_res_final.append({'text': text, 'confidence': float(score)})
result['data'] = rec_res_final
if visualization and result['data']:
result['save_path'] = save_result_image(original_image, rec_results, output_dir, self.directory,
self.lang, self.det, self.rec, self.logger)
all_results.append(result)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
parser = self.arg_parser()
args = parser.parse_args(argvs)
if args.lang is not None:
self.lang = args.lang
self.det = args.det
self.rec = args.rec
self.use_angle_cls = args.use_angle_cls
self.engine = PaddleOCR(
lang=self.lang,
det=args.det,
rec=args.rec,
use_angle_cls=args.use_angle_cls,
enable_mkldnn=args.enable_mkldnn,
use_gpu=args.use_gpu,
det_db_box_thresh=args.box_thresh,
cls_thresh=args.angle_classification_thresh)
results = self.recognize_text(
paths=[args.input_path], output_dir=args.output_dir, visualization=args.visualization)
return results
def arg_parser(self):
parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
parser.add_argument('--input_path', type=str, default=None, help="diretory to image. Required.", required=True)
parser.add_argument('--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not")
parser.add_argument('--output_dir', type=str, default='ocr_result', help="The directory to save output images.")
parser.add_argument(
'--visualization', type=ast.literal_eval, default=False, help="whether to save output as images.")
parser.add_argument('--lang', type=str, default=None, help="the selection of languages")
parser.add_argument('--det', type=ast.literal_eval, default=True, help="whether use text detector or not")
parser.add_argument('--rec', type=ast.literal_eval, default=True, help="whether use text recognizer or not")
parser.add_argument(
'--use_angle_cls', type=ast.literal_eval, default=False, help="whether text orientation classifier or not")
parser.add_argument('--enable_mkldnn', type=ast.literal_eval, default=False, help="whether use mkldnn or not")
parser.add_argument(
"--box_thresh", type=float, default=0.6, help="set the threshold of the detected text box's confidence")
parser.add_argument(
"--angle_classification_thresh",
type=float,
default=0.9,
help="set the threshold of the angle classification confidence")
return parser
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
v0, v1, v2 = paddle2onnx.__version__.split('.')
if int(v1) < 9:
raise ImportError("paddle2onnx>=0.9.0 is required")
if input_shape_dict is None:
input_shape_dict = {'x': [-1, 3, -1, -1]}
if input_shape_dict is not None and not isinstance(input_shape_dict, dict):
raise Exception("input_shape_dict should be dict, eg. {'x': [-1, 3, -1, -1]}.")
if opset_version <= 9:
raise Exception("opset_version <= 9 is not surpported, please try with higher opset_version >=10.")
path_dict = {"det": self.det_model_dir, "rec": self.rec_model_dir, "cls": self.cls_model_dir}
for (key, path) in path_dict.items():
model_filename = 'inference.pdmodel'
params_filename = 'inference.pdiparams'
save_file = os.path.join(dirname, '{}_{}.onnx'.format(self.name, key))
# convert model save with 'paddle.fluid.io.save_inference_model'
if hasattr(paddle, 'enable_static'):
paddle.enable_static()
exe = fluid.Executor(fluid.CPUPlace())
if model_filename is None and params_filename is None:
[program, feed_var_names, fetch_vars] = fluid.io.load_inference_model(path, exe)
else:
[program, feed_var_names, fetch_vars] = fluid.io.load_inference_model(
path, exe, model_filename=model_filename, params_filename=params_filename)
onnx_proto = p2o.run_convert(program, input_shape_dict=input_shape_dict, opset_version=opset_version)
mkdir(save_file)
with open(save_file, "wb") as f:
f.write(onnx_proto.SerializeToString())
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
import os
import time
import cv2
import numpy as np
from PIL import Image, ImageDraw
from paddleocr import draw_ocr
def save_result_image(original_image,
rec_results,
output_dir='ocr_result',
directory=None,
lang='ch',
det=True,
rec=True,
logger=None):
image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
if det and rec:
boxes = [line[0] for line in rec_results]
txts = [line[1][0] for line in rec_results]
scores = [line[1][1] for line in rec_results]
fonts_lang = 'fonts/simfang.ttf'
lang_fonts = {
'korean': 'korean',
'fr': 'french',
'german': 'german',
'hi': 'hindi',
'ne': 'nepali',
'fa': 'persian',
'es': 'spanish',
'ta': 'tamil',
'te': 'telugu',
'ur': 'urdu',
'ug': 'uyghur',
}
if lang in lang_fonts.keys():
fonts_lang = 'fonts/' + lang_fonts[lang] + '.ttf'
font_file = os.path.join(directory, 'assets', fonts_lang)
im_show = draw_ocr(image, boxes, txts, scores, font_path=font_file)
elif det and not rec:
boxes = rec_results
im_show = draw_boxes(image, boxes)
im_show = np.array(im_show)
else:
logger.warning("only cls or rec not supported visualization.")
return ""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
save_file_path = os.path.join(output_dir, saved_name)
im_show = Image.fromarray(im_show)
im_show.save(save_file_path)
return save_file_path
def read_images(paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
continue
images.append(img)
return images
def draw_boxes(image, boxes, scores=None, drop_score=0.5):
img = image.copy()
draw = ImageDraw.Draw(img)
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score:
continue
draw.line([(box[0][0], box[0][1]), (box[1][0], box[1][1])], fill='red')
draw.line([(box[1][0], box[1][1]), (box[2][0], box[2][1])], fill='red')
draw.line([(box[2][0], box[2][1]), (box[3][0], box[3][1])], fill='red')
draw.line([(box[3][0], box[3][1]), (box[0][0], box[0][1])], fill='red')
draw.line([(box[0][0] - 1, box[0][1] + 1), (box[1][0] - 1, box[1][1] + 1)], fill='red')
draw.line([(box[1][0] - 1, box[1][1] + 1), (box[2][0] - 1, box[2][1] + 1)], fill='red')
draw.line([(box[2][0] - 1, box[2][1] + 1), (box[3][0] - 1, box[3][1] + 1)], fill='red')
draw.line([(box[3][0] - 1, box[3][1] + 1), (box[0][0] - 1, box[0][1] + 1)], fill='red')
return img
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def mkdir(path):
sub_dir = os.path.dirname(path)
if not os.path.exists(sub_dir):
os.makedirs(sub_dir)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册