diff --git a/README.md b/README.md index b7e62e237f448824a23138f64384deaa496c4b5d..b62f41bde85e51bc490cc65e0dd9d9a4ae52e4fb 100644 --- a/README.md +++ b/README.md @@ -231,3 +231,4 @@ We welcome you to contribute code to PaddleHub, and thank you for your feedback. * Many thanks to [zl1271](https://github.com/zl1271) for fixing serving docs typo * Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of UGATIT and deoldify models in Hugging Face spaces * Many thanks to [itegel](https://github.com/itegel) for fixing quick start docs typo +* Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of Photo2Cartoon model in Hugging Face spaces diff --git a/README_ch.md b/README_ch.md index 4d4efd58b4304477fd8d3737a476810610e50e80..0214cc8fd204f1f8351039fd44d6802cfd9d3943 100644 --- a/README_ch.md +++ b/README_ch.md @@ -247,3 +247,4 @@ print(results) * 非常感谢[zl1271](https://github.com/zl1271)修复了serving文档中的错别字 * 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了UGATIT和deoldify模型的web demo * 非常感谢[itegel](https://github.com/itegel)修复了快速开始文档中的错别字 +* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了Photo2Cartoon模型的web demo diff --git a/docs/docs_en/visualization.md b/docs/docs_en/visualization.md index 43dd60ea6a7ff52c3f912acad1bb4ce6149d8469..363ac7a95121c098c292de2004e636ec78c7646f 100644 --- a/docs/docs_en/visualization.md +++ b/docs/docs_en/visualization.md @@ -50,6 +50,8 @@ **UGATIT Selfie2anime Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/U-GAT-IT-selfie2anime) +**Photo2Cartoon Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/photo2cartoon) + ### Object Detection - Pedestrian detection, vehicle detection, and more industrial-grade ultra-large-scale pretrained models are provided. diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ebe8bf4bdceb8c03cb27de7c83c46ad696050387 --- /dev/null +++ b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# arabic_ocr_db_crnn_mobile + +|模型名称|arabic_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - arabic_ocr_db_crnn_mobile Module用于识别图片当中的阿拉伯文字,包括阿拉伯文、波斯文、维吾尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的阿拉伯文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别阿拉伯文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install arabic_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="arabic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造ArabicOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m arabic_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/arabic_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..e1d603f6eabdb622b5cf58b9a5b645e991d3889a --- /dev/null +++ b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="arabic_ocr_db_crnn_mobile", + version="1.1.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class ArabicOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="arabic", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/arabic_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7c0c37dd421e1d0c8d6ef1d6a000546f4f109e71 --- /dev/null +++ b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# chinese_cht_ocr_db_crnn_mobile + +|模型名称|chinese_cht_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - chinese_cht_ocr_db_crnn_mobile Module用于识别图片当中的繁体中文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的繁体中文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别繁体中文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install chinese_cht_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、预测代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="chinese_cht_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造ChineseChtOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m chinese_cht_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/chinese_cht_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..b1c10a8feab26bb3a00e235c00de56d7476476bb --- /dev/null +++ b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="chinese_cht_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class ChineseChtOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="chinese_cht", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/chinese_cht_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a429f3181baa7010af8c438b60eae877103189fa --- /dev/null +++ b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# cyrillic_ocr_db_crnn_mobile + +|模型名称|cyrillic_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - cyrillic_ocr_db_crnn_mobile Module用于识别图片当中的斯拉夫文,包括俄罗斯文、塞尔维亚文、白俄罗斯文、保加利亚文、乌克兰文、蒙古文、阿迪赫文、阿瓦尔文、达尔瓦文、因古什文、拉克文、莱兹甘文、塔巴萨兰文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的斯拉夫文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别斯拉夫文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install cyrillic_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="cyrillic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造CyrillicOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m cyrillic_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/cyrillic_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..bd182e6693ddb72059fbb3a5cc28a96e3f27c1e6 --- /dev/null +++ b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="cyrillic_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class CyrillicOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="cyrillic", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/cyrillic_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bfb97a1913053eafbc589ccbfa6b9059ff91c06b --- /dev/null +++ b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# devanagari_ocr_db_crnn_mobile + +|模型名称|devanagari_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - devanagari_ocr_db_crnn_mobile Module用于识别图片当中的梵文,包括印地文、马拉地文、尼泊尔文、比尔哈文、迈蒂利文、昂加文、孟加拉文、摩揭陀文、那格浦尔文、尼瓦尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的梵文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别梵文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install devanagari_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="devanagari_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造DevanagariOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m devanagari_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/devanagari_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..a165f934188d9d0df9fd9f18378e141330ff4b38 --- /dev/null +++ b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="devanagari_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class DevanagariOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="devanagari", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/devanagari_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/french_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a63ffa3eab9618ffec9f47b3a4c3cfb3f4a493fa --- /dev/null +++ b/modules/image/text_recognition/french_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# french_ocr_db_crnn_mobile + +|模型名称|french_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - french_ocr_db_crnn_mobile Module用于识别图片当中的法文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的法文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别法文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install french_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="french_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造FrechOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m french_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/french_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/french_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/french_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..f2aa331bd80dcc6a4f6ac5525baad62f59b768dc --- /dev/null +++ b/modules/image/text_recognition/french_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="french_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class FrechOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="fr", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/french_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/french_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/french_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt b/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt deleted file mode 100644 index 30c4d4218e8a77386db912e24117b1f197466e83..0000000000000000000000000000000000000000 --- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt +++ /dev/null @@ -1,131 +0,0 @@ -! -" -$ -% -& -' -( -) -+ -, -- -. -/ -0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -: -; -> -? -A -B -C -D -E -F -G -H -I -J -K -L -M -N -O -P -Q -R -S -T -U -V -W -X -Y -Z -[ -] -a -b -c -d -e -f -g -h -i -j -k -l -m -n -o -p -q -r -s -t -u -v -w -x -y -z -£ -§ -­ -² -´ -µ -· -º -¼ -½ -¿ -À -Á -Ä -Å -Ç -É -Í -Ï -Ô -Ö -Ø -Ù -Ü -ß -à -á -â -ã -ä -å -æ -ç -è -é -ê -ë -í -ï -ñ -ò -ó -ô -ö -ø -ù -ú -û -ü - diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py b/modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py deleted file mode 100644 index 21dbbd9dc790e3d009f45c1ef1b68c001e9f0e0b..0000000000000000000000000000000000000000 --- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py +++ /dev/null @@ -1,213 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import string - -class CharacterOps(object): - """ Convert between text-label and text-index """ - - def __init__(self, config): - self.character_type = config['character_type'] - self.loss_type = config['loss_type'] - self.max_text_len = config['max_text_length'] - if self.character_type == "en": - self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" - dict_character = list(self.character_str) - elif self.character_type in [ - "ch", 'japan', 'korean', 'french', 'german' - ]: - character_dict_path = config['character_dict_path'] - add_space = False - if 'use_space_char' in config: - add_space = config['use_space_char'] - self.character_str = "" - with open(character_dict_path, "rb") as fin: - lines = fin.readlines() - for line in lines: - line = line.decode('utf-8').strip("\n").strip("\r\n") - self.character_str += line - if add_space: - self.character_str += " " - dict_character = list(self.character_str) - elif self.character_type == "en_sensitive": - # same with ASTER setting (use 94 char). - self.character_str = string.printable[:-6] - dict_character = list(self.character_str) - else: - self.character_str = None - assert self.character_str is not None, \ - "Nonsupport type of the character: {}".format(self.character_str) - self.beg_str = "sos" - self.end_str = "eos" - if self.loss_type == "attention": - dict_character = [self.beg_str, self.end_str] + dict_character - elif self.loss_type == "srn": - dict_character = dict_character + [self.beg_str, self.end_str] - self.dict = {} - for i, char in enumerate(dict_character): - self.dict[char] = i - self.character = dict_character - - def encode(self, text): - """convert text-label into text-index. - input: - text: text labels of each image. [batch_size] - - output: - text: concatenated text index for CTCLoss. - [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)] - length: length of each text. [batch_size] - """ - if self.character_type == "en": - text = text.lower() - - text_list = [] - for char in text: - if char not in self.dict: - continue - text_list.append(self.dict[char]) - text = np.array(text_list) - return text - - def decode(self, text_index, is_remove_duplicate=False): - """ convert text-index into text-label. """ - char_list = [] - char_num = self.get_char_num() - - if self.loss_type == "attention": - beg_idx = self.get_beg_end_flag_idx("beg") - end_idx = self.get_beg_end_flag_idx("end") - ignored_tokens = [beg_idx, end_idx] - else: - ignored_tokens = [char_num] - - for idx in range(len(text_index)): - if text_index[idx] in ignored_tokens: - continue - if is_remove_duplicate: - if idx > 0 and text_index[idx - 1] == text_index[idx]: - continue - char_list.append(self.character[int(text_index[idx])]) - text = ''.join(char_list) - return text - - def get_char_num(self): - return len(self.character) - - def get_beg_end_flag_idx(self, beg_or_end): - if self.loss_type == "attention": - if beg_or_end == "beg": - idx = np.array(self.dict[self.beg_str]) - elif beg_or_end == "end": - idx = np.array(self.dict[self.end_str]) - else: - assert False, "Unsupport type %s in get_beg_end_flag_idx"\ - % beg_or_end - return idx - else: - err = "error in get_beg_end_flag_idx when using the loss %s"\ - % (self.loss_type) - assert False, err - - -def cal_predicts_accuracy(char_ops, - preds, - preds_lod, - labels, - labels_lod, - is_remove_duplicate=False): - acc_num = 0 - img_num = 0 - for ino in range(len(labels_lod) - 1): - beg_no = preds_lod[ino] - end_no = preds_lod[ino + 1] - preds_text = preds[beg_no:end_no].reshape(-1) - preds_text = char_ops.decode(preds_text, is_remove_duplicate) - - beg_no = labels_lod[ino] - end_no = labels_lod[ino + 1] - labels_text = labels[beg_no:end_no].reshape(-1) - labels_text = char_ops.decode(labels_text, is_remove_duplicate) - img_num += 1 - - if preds_text == labels_text: - acc_num += 1 - acc = acc_num * 1.0 / img_num - return acc, acc_num, img_num - - -def cal_predicts_accuracy_srn(char_ops, - preds, - labels, - max_text_len, - is_debug=False): - acc_num = 0 - img_num = 0 - - char_num = char_ops.get_char_num() - - total_len = preds.shape[0] - img_num = int(total_len / max_text_len) - for i in range(img_num): - cur_label = [] - cur_pred = [] - for j in range(max_text_len): - if labels[j + i * max_text_len] != int(char_num - 1): #0 - cur_label.append(labels[j + i * max_text_len][0]) - else: - break - - for j in range(max_text_len + 1): - if j < len(cur_label) and preds[j + i * max_text_len][ - 0] != cur_label[j]: - break - elif j == len(cur_label) and j == max_text_len: - acc_num += 1 - break - elif j == len(cur_label) and preds[j + i * max_text_len][0] == int( - char_num - 1): - acc_num += 1 - break - acc = acc_num * 1.0 / img_num - return acc, acc_num, img_num - - -def convert_rec_attention_infer_res(preds): - img_num = preds.shape[0] - target_lod = [0] - convert_ids = [] - for ino in range(img_num): - end_pos = np.where(preds[ino, :] == 1)[0] - if len(end_pos) <= 1: - text_list = preds[ino, 1:] - else: - text_list = preds[ino, 1:end_pos[1]] - target_lod.append(target_lod[ino] + len(text_list)) - convert_ids = convert_ids + list(text_list) - convert_ids = np.array(convert_ids) - convert_ids = convert_ids.reshape((-1, 1)) - return convert_ids, target_lod - - -def convert_rec_label_to_lod(ori_labels): - img_num = len(ori_labels) - target_lod = [0] - convert_ids = [] - for ino in range(img_num): - target_lod.append(target_lod[ino] + len(ori_labels[ino])) - convert_ids = convert_ids + list(ori_labels[ino]) - convert_ids = np.array(convert_ids) - convert_ids = convert_ids.reshape((-1, 1)) - return convert_ids, target_lod diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py index 6b59d274faa7a583851369a38fb73756dfcbcebe..569cc14817d85313037a60463f0115fb0a65deaf 100644 --- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py +++ b/modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py @@ -1,304 +1,61 @@ -# -*- coding:utf-8 -*- -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import argparse -import ast -import copy -import math -import os -import time - -from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor -from paddlehub.common.logger import logger -from paddlehub.module.module import moduleinfo, runnable, serving -from PIL import Image -import cv2 -import numpy as np -import paddle.fluid as fluid import paddlehub as hub - -from german_ocr_db_crnn_mobile.character import CharacterOps -from german_ocr_db_crnn_mobile.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving @moduleinfo( name="german_ocr_db_crnn_mobile", - version="1.0.0", - summary= - "The module can recognize the german texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization module. Then it recognizes the german texts. ", - author="paddle-dev", - author_email="paddle-dev@baidu.com", + version="1.1.0", + summary="ocr service", + author="PaddlePaddle", type="cv/text_recognition") -class GermanOCRDBCRNNMobile(hub.Module): - def _initialize(self, text_detector_module=None, enable_mkldnn=False, use_angle_classification=False): +class GermanOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): """ initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence """ - self.character_dict_path = os.path.join(self.directory, 'assets', - 'german_dict.txt') - char_ops_params = { - 'character_type': 'german', - 'character_dict_path': self.character_dict_path, - 'loss_type': 'ctc', - 'max_text_length': 25, - 'use_space_char': True - } - self.char_ops = CharacterOps(char_ops_params) - self.rec_image_shape = [3, 32, 320] - self._text_detector_module = text_detector_module - self.font_file = os.path.join(self.directory, 'assets', 'german.ttf') - self.enable_mkldnn = enable_mkldnn - self.use_angle_classification = use_angle_classification - - self.rec_pretrained_model_path = os.path.join( - self.directory, 'inference_model', 'character_rec') - self.rec_predictor, self.rec_input_tensor, self.rec_output_tensors = self._set_config( - self.rec_pretrained_model_path) - - if self.use_angle_classification: - self.cls_pretrained_model_path = os.path.join( - self.directory, 'inference_model', 'angle_cls') - - self.cls_predictor, self.cls_input_tensor, self.cls_output_tensors = self._set_config( - self.cls_pretrained_model_path) - - def _set_config(self, pretrained_model_path): - """ - predictor config path - """ - model_file_path = os.path.join(pretrained_model_path, 'model') - params_file_path = os.path.join(pretrained_model_path, 'params') - - config = AnalysisConfig(model_file_path, params_file_path) - try: - _places = os.environ["CUDA_VISIBLE_DEVICES"] - int(_places[0]) - use_gpu = True - except: - use_gpu = False - - if use_gpu: - config.enable_use_gpu(8000, 0) - else: - config.disable_gpu() - if self.enable_mkldnn: - # cache 10 different shapes for mkldnn to avoid memory leak - config.set_mkldnn_cache_capacity(10) - config.enable_mkldnn() - - config.disable_glog_info() - config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") - config.switch_use_feed_fetch_ops(False) - - predictor = create_paddle_predictor(config) - - input_names = predictor.get_input_names() - input_tensor = predictor.get_input_tensor(input_names[0]) - output_names = predictor.get_output_names() - output_tensors = [] - for output_name in output_names: - output_tensor = predictor.get_output_tensor(output_name) - output_tensors.append(output_tensor) - - return predictor, input_tensor, output_tensors - - @property - def text_detector_module(self): - """ - text detect module - """ - if not self._text_detector_module: - self._text_detector_module = hub.Module( - name='chinese_text_detection_db_mobile', - enable_mkldnn=self.enable_mkldnn, - version='1.0.4') - return self._text_detector_module - - def read_images(self, paths=[]): - images = [] - for img_path in paths: - assert os.path.isfile( - img_path), "The {} isn't a valid file.".format(img_path) - img = cv2.imread(img_path) - if img is None: - logger.info("error in loading image:{}".format(img_path)) - continue - images.append(img) - return images - - def get_rotate_crop_image(self, img, points): - ''' - img_height, img_width = img.shape[0:2] - left = int(np.min(points[:, 0])) - right = int(np.max(points[:, 0])) - top = int(np.min(points[:, 1])) - bottom = int(np.max(points[:, 1])) - img_crop = img[top:bottom, left:right, :].copy() - points[:, 0] = points[:, 0] - left - points[:, 1] = points[:, 1] - top - ''' - img_crop_width = int( - max( - np.linalg.norm(points[0] - points[1]), - np.linalg.norm(points[2] - points[3]))) - img_crop_height = int( - max( - np.linalg.norm(points[0] - points[3]), - np.linalg.norm(points[1] - points[2]))) - pts_std = np.float32([[0, 0], [img_crop_width, 0], - [img_crop_width, img_crop_height], - [0, img_crop_height]]) - M = cv2.getPerspectiveTransform(points, pts_std) - dst_img = cv2.warpPerspective( - img, - M, (img_crop_width, img_crop_height), - borderMode=cv2.BORDER_REPLICATE, - flags=cv2.INTER_CUBIC) - dst_img_height, dst_img_width = dst_img.shape[0:2] - if dst_img_height * 1.0 / dst_img_width >= 1.5: - dst_img = np.rot90(dst_img) - return dst_img - - def resize_norm_img_rec(self, img, max_wh_ratio): - imgC, imgH, imgW = self.rec_image_shape - assert imgC == img.shape[2] - h, w = img.shape[:2] - ratio = w / float(h) - if math.ceil(imgH * ratio) > imgW: - resized_w = imgW - else: - resized_w = int(math.ceil(imgH * ratio)) - resized_image = cv2.resize(img, (resized_w, imgH)) - resized_image = resized_image.astype('float32') - resized_image = resized_image.transpose((2, 0, 1)) / 255 - resized_image -= 0.5 - resized_image /= 0.5 - padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) - padding_im[:, :, 0:resized_w] = resized_image - return padding_im - - def resize_norm_img_cls(self, img): - cls_image_shape = [3, 48, 192] - imgC, imgH, imgW = cls_image_shape - h = img.shape[0] - w = img.shape[1] - ratio = w / float(h) - if math.ceil(imgH * ratio) > imgW: - resized_w = imgW - else: - resized_w = int(math.ceil(imgH * ratio)) - resized_image = cv2.resize(img, (resized_w, imgH)) - resized_image = resized_image.astype('float32') - if cls_image_shape[0] == 1: - resized_image = resized_image / 255 - resized_image = resized_image[np.newaxis, :] - else: - resized_image = resized_image.transpose((2, 0, 1)) / 255 - resized_image -= 0.5 - resized_image /= 0.5 - padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) - padding_im[:, :, 0:resized_w] = resized_image - return padding_im - - def recognize_text(self, - images=[], - paths=[], - use_gpu=False, - output_dir='ocr_result', - visualization=False, - box_thresh=0.5, - text_thresh=0.5, - angle_classification_thresh=0.9): - """ - Get the chinese texts in the predicted images. + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="german", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. Args: images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths paths (list[str]): The paths of images. If paths not images - use_gpu (bool): Whether to use gpu. - batch_size(int): the program deals once with one output_dir (str): The directory to store output images. visualization (bool): Whether to save image or not. - box_thresh(float): the threshold of the detected text box's confidence - text_thresh(float): the threshold of the chinese text recognition confidence - angle_classification_thresh(float): the threshold of the angle classification confidence - Returns: - res (list): The result of chinese texts and save path of images. + res (list): The result of text detection box and save path of images. """ - if use_gpu: - try: - _places = os.environ["CUDA_VISIBLE_DEVICES"] - int(_places[0]) - except: - raise RuntimeError( - "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." - ) - - self.use_gpu = use_gpu - - if images != [] and isinstance(images, list) and paths == []: - predicted_data = images - elif images == [] and isinstance(paths, list) and paths != []: - predicted_data = self.read_images(paths) - else: - raise TypeError("The input data is inconsistent with expectations.") - - assert predicted_data != [], "There is not any image to be predicted. Please check the input data." - - detection_results = self.text_detector_module.detect_text( - images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh) - print('*'*10) - print(detection_results) - - boxes = [ - np.array(item['data']).astype(np.float32) - for item in detection_results - ] - all_results = [] - for index, img_boxes in enumerate(boxes): - original_image = predicted_data[index].copy() - result = {'save_path': ''} - if img_boxes.size == 0: - result['data'] = [] - else: - img_crop_list = [] - boxes = sorted_boxes(img_boxes) - for num_box in range(len(boxes)): - tmp_box = copy.deepcopy(boxes[num_box]) - img_crop = self.get_rotate_crop_image( - original_image, tmp_box) - img_crop_list.append(img_crop) - - if self.use_angle_classification: - img_crop_list, angle_list = self._classify_text( - img_crop_list, - angle_classification_thresh=angle_classification_thresh) - - rec_results = self._recognize_text(img_crop_list) - - # if the recognized text confidence score is lower than text_thresh, then drop it - rec_res_final = [] - for index, res in enumerate(rec_results): - text, score = res - if score >= text_thresh: - rec_res_final.append({ - 'text': - text, - 'confidence': - float(score), - 'text_box_position': - boxes[index].astype(np.int).tolist() - }) - result['data'] = rec_res_final - - if visualization and result['data']: - result['save_path'] = self.save_result_image( - original_image, boxes, rec_results, output_dir, - text_thresh) - all_results.append(result) - + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) return all_results @serving @@ -310,282 +67,21 @@ class GermanOCRDBCRNNMobile(hub.Module): results = self.recognize_text(images_decode, **kwargs) return results - def save_result_image( - self, - original_image, - detection_boxes, - rec_results, - output_dir='ocr_result', - text_thresh=0.5, - ): - image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)) - txts = [item[0] for item in rec_results] - scores = [item[1] for item in rec_results] - draw_img = draw_ocr( - image, - detection_boxes, - txts, - scores, - font_file=self.font_file, - draw_txt=True, - drop_score=text_thresh) - - if not os.path.exists(output_dir): - os.makedirs(output_dir) - ext = get_image_ext(original_image) - saved_name = 'ndarray_{}{}'.format(time.time(), ext) - save_file_path = os.path.join(output_dir, saved_name) - cv2.imwrite(save_file_path, draw_img[:, :, ::-1]) - return save_file_path - - def _classify_text(self, image_list, angle_classification_thresh=0.9): - img_list = copy.deepcopy(image_list) - img_num = len(img_list) - # Calculate the aspect ratio of all text bars - width_list = [] - for img in img_list: - width_list.append(img.shape[1] / float(img.shape[0])) - # Sorting can speed up the cls process - indices = np.argsort(np.array(width_list)) - - cls_res = [['', 0.0]] * img_num - batch_num = 30 - for beg_img_no in range(0, img_num, batch_num): - end_img_no = min(img_num, beg_img_no + batch_num) - norm_img_batch = [] - max_wh_ratio = 0 - for ino in range(beg_img_no, end_img_no): - h, w = img_list[indices[ino]].shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - for ino in range(beg_img_no, end_img_no): - norm_img = self.resize_norm_img_cls(img_list[indices[ino]]) - norm_img = norm_img[np.newaxis, :] - norm_img_batch.append(norm_img) - norm_img_batch = np.concatenate(norm_img_batch) - norm_img_batch = norm_img_batch.copy() - - self.cls_input_tensor.copy_from_cpu(norm_img_batch) - self.cls_predictor.zero_copy_run() - - prob_out = self.cls_output_tensors[0].copy_to_cpu() - label_out = self.cls_output_tensors[1].copy_to_cpu() - if len(label_out.shape) != 1: - prob_out, label_out = label_out, prob_out - label_list = ['0', '180'] - for rno in range(len(label_out)): - label_idx = label_out[rno] - score = prob_out[rno][label_idx] - label = label_list[label_idx] - cls_res[indices[beg_img_no + rno]] = [label, score] - if '180' in label and score > angle_classification_thresh: - img_list[indices[beg_img_no + rno]] = cv2.rotate( - img_list[indices[beg_img_no + rno]], 1) - return img_list, cls_res - - def _recognize_text(self, img_list): - img_num = len(img_list) - # Calculate the aspect ratio of all text bars - width_list = [] - for img in img_list: - width_list.append(img.shape[1] / float(img.shape[0])) - # Sorting can speed up the recognition process - indices = np.argsort(np.array(width_list)) - - rec_res = [['', 0.0]] * img_num - batch_num = 30 - for beg_img_no in range(0, img_num, batch_num): - end_img_no = min(img_num, beg_img_no + batch_num) - norm_img_batch = [] - max_wh_ratio = 0 - for ino in range(beg_img_no, end_img_no): - h, w = img_list[indices[ino]].shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - for ino in range(beg_img_no, end_img_no): - norm_img = self.resize_norm_img_rec(img_list[indices[ino]], - max_wh_ratio) - norm_img = norm_img[np.newaxis, :] - norm_img_batch.append(norm_img) - - norm_img_batch = np.concatenate(norm_img_batch, axis=0) - norm_img_batch = norm_img_batch.copy() - - self.rec_input_tensor.copy_from_cpu(norm_img_batch) - self.rec_predictor.zero_copy_run() - - rec_idx_batch = self.rec_output_tensors[0].copy_to_cpu() - rec_idx_lod = self.rec_output_tensors[0].lod()[0] - predict_batch = self.rec_output_tensors[1].copy_to_cpu() - predict_lod = self.rec_output_tensors[1].lod()[0] - for rno in range(len(rec_idx_lod) - 1): - beg = rec_idx_lod[rno] - end = rec_idx_lod[rno + 1] - rec_idx_tmp = rec_idx_batch[beg:end, 0] - preds_text = self.char_ops.decode(rec_idx_tmp) - beg = predict_lod[rno] - end = predict_lod[rno + 1] - probs = predict_batch[beg:end, :] - ind = np.argmax(probs, axis=1) - blank = probs.shape[1] - valid_ind = np.where(ind != (blank - 1))[0] - if len(valid_ind) == 0: - continue - score = np.mean(probs[valid_ind, ind[valid_ind]]) - # rec_res.append([preds_text, score]) - rec_res[indices[beg_img_no + rno]] = [preds_text, score] - - return rec_res - - def save_inference_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - detector_dir = os.path.join(dirname, 'text_detector') - classifier_dir = os.path.join(dirname, 'angle_classifier') - recognizer_dir = os.path.join(dirname, 'text_recognizer') - self._save_detector_model(detector_dir, model_filename, params_filename, - combined) - if self.use_angle_classification: - self._save_classifier_model(classifier_dir, model_filename, - params_filename, combined) - - self._save_recognizer_model(recognizer_dir, model_filename, - params_filename, combined) - logger.info("The inference model has been saved in the path {}".format( - os.path.realpath(dirname))) - - def _save_detector_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - self.text_detector_module.save_inference_model( - dirname, model_filename, params_filename, combined) - - def _save_recognizer_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - if combined: - model_filename = "__model__" if not model_filename else model_filename - params_filename = "__params__" if not params_filename else params_filename - place = fluid.CPUPlace() - exe = fluid.Executor(place) - - model_file_path = os.path.join(self.rec_pretrained_model_path, 'model') - params_file_path = os.path.join(self.rec_pretrained_model_path, - 'params') - program, feeded_var_names, target_vars = fluid.io.load_inference_model( - dirname=self.rec_pretrained_model_path, - model_filename=model_file_path, - params_filename=params_file_path, - executor=exe) - - fluid.io.save_inference_model( - dirname=dirname, - main_program=program, - executor=exe, - feeded_var_names=feeded_var_names, - target_vars=target_vars, - model_filename=model_filename, - params_filename=params_filename) - - def _save_classifier_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - if combined: - model_filename = "__model__" if not model_filename else model_filename - params_filename = "__params__" if not params_filename else params_filename - place = fluid.CPUPlace() - exe = fluid.Executor(place) - - model_file_path = os.path.join(self.cls_pretrained_model_path, 'model') - params_file_path = os.path.join(self.cls_pretrained_model_path, - 'params') - program, feeded_var_names, target_vars = fluid.io.load_inference_model( - dirname=self.cls_pretrained_model_path, - model_filename=model_file_path, - params_filename=params_file_path, - executor=exe) - - fluid.io.save_inference_model( - dirname=dirname, - main_program=program, - executor=exe, - feeded_var_names=feeded_var_names, - target_vars=target_vars, - model_filename=model_filename, - params_filename=params_filename) - @runnable def run_cmd(self, argvs): """ Run as a command """ - self.parser = argparse.ArgumentParser( - description="Run the %s module." % self.name, - prog='hub run %s' % self.name, - usage='%(prog)s', - add_help=True) - - self.arg_input_group = self.parser.add_argument_group( - title="Input options", description="Input data. Required") - self.arg_config_group = self.parser.add_argument_group( - title="Config options", - description= - "Run configuration for controlling module behavior, not required.") - - self.add_module_config_arg() - self.add_module_input_arg() - - args = self.parser.parse_args(argvs) - results = self.recognize_text( - paths=[args.input_path], - use_gpu=args.use_gpu, - output_dir=args.output_dir, - visualization=args.visualization) + results = self.model.run_cmd(argvs) return results - def add_module_config_arg(self): - """ - Add the command config options - """ - self.arg_config_group.add_argument( - '--use_gpu', - type=ast.literal_eval, - default=False, - help="whether use GPU or not") - self.arg_config_group.add_argument( - '--output_dir', - type=str, - default='ocr_result', - help="The directory to save output images.") - self.arg_config_group.add_argument( - '--visualization', - type=ast.literal_eval, - default=False, - help="whether to save output as images.") - - def add_module_input_arg(self): - """ - Add the command input options - """ - self.arg_input_group.add_argument( - '--input_path', type=str, default=None, help="diretory to image") - + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. -if __name__ == '__main__': - ocr = GermanOCRDBCRNNMobile(enable_mkldnn=False, use_angle_classification=True) - image_path = [ - '/mnt/zhangxuefei/PaddleOCR/doc/imgs/ger_1.jpg', - '/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg', - '/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg' - ] - res = ocr.recognize_text(paths=image_path, visualization=True) - ocr.save_inference_model('save') - print(res) + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/german_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/german_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py b/modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py deleted file mode 100644 index 8c41af300cc91de369a473cb7327b794b6cf5715..0000000000000000000000000000000000000000 --- a/modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py +++ /dev/null @@ -1,190 +0,0 @@ -# -*- coding:utf-8 -*- -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math - -from PIL import Image, ImageDraw, ImageFont -import base64 -import cv2 -import numpy as np - - -def draw_ocr(image, - boxes, - txts, - scores, - font_file, - draw_txt=True, - drop_score=0.5): - """ - Visualize the results of OCR detection and recognition - args: - image(Image|array): RGB image - boxes(list): boxes with shape(N, 4, 2) - txts(list): the texts - scores(list): txxs corresponding scores - draw_txt(bool): whether draw text or not - drop_score(float): only scores greater than drop_threshold will be visualized - return(array): - the visualized img - """ - if scores is None: - scores = [1] * len(boxes) - for (box, score) in zip(boxes, scores): - if score < drop_score or math.isnan(score): - continue - box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) - image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) - - if draw_txt: - img = np.array(resize_img(image, input_size=600)) - txt_img = text_visual( - txts, - scores, - font_file, - img_h=img.shape[0], - img_w=600, - threshold=drop_score) - img = np.concatenate([np.array(img), np.array(txt_img)], axis=1) - return img - return image - - -def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.): - """ - create new blank img and draw txt on it - args: - texts(list): the text will be draw - scores(list|None): corresponding score of each txt - img_h(int): the height of blank img - img_w(int): the width of blank img - return(array): - """ - if scores is not None: - assert len(texts) == len( - scores), "The number of txts and corresponding scores must match" - - def create_blank_img(): - blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255 - blank_img[:, img_w - 1:] = 0 - blank_img = Image.fromarray(blank_img).convert("RGB") - draw_txt = ImageDraw.Draw(blank_img) - return blank_img, draw_txt - - blank_img, draw_txt = create_blank_img() - - font_size = 20 - txt_color = (0, 0, 0) - font = ImageFont.truetype(font_file, font_size, encoding="utf-8") - - gap = font_size + 5 - txt_img_list = [] - count, index = 1, 0 - for idx, txt in enumerate(texts): - index += 1 - if scores[idx] < threshold or math.isnan(scores[idx]): - index -= 1 - continue - first_line = True - while str_count(txt) >= img_w // font_size - 4: - tmp = txt - txt = tmp[:img_w // font_size - 4] - if first_line: - new_txt = str(index) + ': ' + txt - first_line = False - else: - new_txt = ' ' + txt - draw_txt.text((0, gap * count), new_txt, txt_color, font=font) - txt = tmp[img_w // font_size - 4:] - if count >= img_h // gap - 1: - txt_img_list.append(np.array(blank_img)) - blank_img, draw_txt = create_blank_img() - count = 0 - count += 1 - if first_line: - new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx]) - else: - new_txt = " " + txt + " " + '%.3f' % (scores[idx]) - draw_txt.text((0, gap * count), new_txt, txt_color, font=font) - # whether add new blank img or not - if count >= img_h // gap - 1 and idx + 1 < len(texts): - txt_img_list.append(np.array(blank_img)) - blank_img, draw_txt = create_blank_img() - count = 0 - count += 1 - txt_img_list.append(np.array(blank_img)) - if len(txt_img_list) == 1: - blank_img = np.array(txt_img_list[0]) - else: - blank_img = np.concatenate(txt_img_list, axis=1) - return np.array(blank_img) - - -def str_count(s): - """ - Count the number of Chinese characters, - a single English character and a single number - equal to half the length of Chinese characters. - args: - s(string): the input of string - return(int): - the number of Chinese characters - """ - import string - count_zh = count_pu = 0 - s_len = len(s) - en_dg_count = 0 - for c in s: - if c in string.ascii_letters or c.isdigit() or c.isspace(): - en_dg_count += 1 - elif c.isalpha(): - count_zh += 1 - else: - count_pu += 1 - return s_len - math.ceil(en_dg_count / 2) - - -def resize_img(img, input_size=600): - img = np.array(img) - im_shape = img.shape - im_size_min = np.min(im_shape[0:2]) - im_size_max = np.max(im_shape[0:2]) - im_scale = float(input_size) / float(im_size_max) - im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale) - return im - - -def get_image_ext(image): - if image.shape[2] == 4: - return ".png" - return ".jpg" - - -def sorted_boxes(dt_boxes): - """ - Sort text boxes in order from top to bottom, left to right - args: - dt_boxes(array):detected text boxes with shape [4, 2] - return: - sorted boxes(array) with shape [4, 2] - """ - num_boxes = dt_boxes.shape[0] - sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0])) - _boxes = list(sorted_boxes) - - for i in range(num_boxes - 1): - if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \ - (_boxes[i + 1][0][0] < _boxes[i][0][0]): - tmp = _boxes[i] - _boxes[i] = _boxes[i + 1] - _boxes[i + 1] = tmp - return _boxes - - -def base64_to_cv2(b64str): - data = base64.b64decode(b64str.encode('utf8')) - data = np.fromstring(data, np.uint8) - data = cv2.imdecode(data, cv2.IMREAD_COLOR) - return data diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc deleted file mode 100644 index ad68243b968fc87b207928594c585039859b75a9..0000000000000000000000000000000000000000 Binary files a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc and /dev/null differ diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt deleted file mode 100644 index 339d4b89e5159a346636641a0814874faa59754a..0000000000000000000000000000000000000000 --- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt +++ /dev/null @@ -1,4399 +0,0 @@ -! -" -# -$ -% -& -' -( -) -* -+ -, -- -. -/ -0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -: -; -< -= -> -? -A -B -C -D -E -F -G -H -I -J -K -L -M -N -O -P -Q -R -S -T -U -V -W -X -Y -Z -[ -] -_ -` -a -b -c -d -e -f -g -h -i -j -k -l -m -n -o -p -q -r -s -t -u -v -w -x -y -z -© -° -² -´ -½ -Á -Ä -Å -Ç -È -É -Í -Ó -Ö -× -Ü -ß -à -á -â -ã -ä -å -æ -ç -è -é -ê -ë -í -ð -ñ -ò -ó -ô -õ -ö -ø -ú -û -ü -ý -ā -ă -ą -ć -Č -č -đ -ē -ė -ę -ğ -ī -ı -Ł -ł -ń -ň -ō -ř -Ş -ş -Š -š -ţ -ū -ż -Ž -ž -Ș -ș -ț -Δ -α -λ -μ -φ -Г -О -а -в -л -о -р -с -т -я -ồ -​ -— -― -’ -“ -” -… -℃ -→ -∇ -− -■ -☆ -  -、 -。 -々 -〆 -〈 -〉 -「 -」 -『 -』 -〔 -〕 -〜 -ぁ -あ -ぃ -い -う -ぇ -え -ぉ -お -か -が -き -ぎ -く -ぐ -け -げ -こ -ご -さ -ざ -し -じ -す -ず -せ -ぜ -そ -ぞ -た -だ -ち -ぢ -っ -つ -づ -て -で -と -ど -な -に -ぬ -ね -の -は -ば -ぱ -ひ -び -ぴ -ふ -ぶ -ぷ -へ -べ -ぺ -ほ -ぼ -ぽ -ま -み -む -め -も -ゃ -や -ゅ -ゆ -ょ -よ -ら -り -る -れ -ろ -わ -ゑ -を -ん -ゝ -ゞ -ァ -ア -ィ -イ -ゥ -ウ -ェ -エ -ォ -オ -カ -ガ -キ -ギ -ク -グ -ケ -ゲ -コ -ゴ -サ -ザ -シ -ジ -ス -ズ -セ -ゼ -ソ -ゾ -タ -ダ -チ -ヂ -ッ -ツ -ヅ -テ -デ -ト -ド -ナ -ニ -ヌ -ネ -ノ -ハ -バ -パ -ヒ -ビ -ピ -フ -ブ -プ -ヘ -ベ -ペ -ホ -ボ -ポ -マ -ミ -ム -メ -モ -ャ -ヤ -ュ -ユ -ョ -ヨ -ラ -リ -ル -レ -ロ -ワ -ヰ -ン -ヴ -ヵ -ヶ -・ -ー -㈱ -一 -丁 -七 -万 -丈 -三 -上 -下 -不 -与 -丑 -且 -世 -丘 -丙 -丞 -両 -並 -中 -串 -丸 -丹 -主 -丼 -丿 -乃 -久 -之 -乎 -乏 -乗 -乘 -乙 -九 -乞 -也 -乱 -乳 -乾 -亀 -了 -予 -争 -事 -二 -于 -互 -五 -井 -亘 -亙 -些 -亜 -亟 -亡 -交 -亥 -亦 -亨 -享 -京 -亭 -亮 -人 -什 -仁 -仇 -今 -介 -仍 -仏 -仔 -仕 -他 -仗 -付 -仙 -代 -令 -以 -仮 -仰 -仲 -件 -任 -企 -伊 -伍 -伎 -伏 -伐 -休 -会 -伝 -伯 -估 -伴 -伶 -伸 -伺 -似 -伽 -佃 -但 -位 -低 -住 -佐 -佑 -体 -何 -余 -佚 -佛 -作 -佩 -佳 -併 -佶 -使 -侈 -例 -侍 -侏 -侑 -侘 -供 -依 -侠 -価 -侮 -侯 -侵 -侶 -便 -係 -促 -俄 -俊 -俔 -俗 -俘 -保 -信 -俣 -俤 -修 -俯 -俳 -俵 -俸 -俺 -倉 -個 -倍 -倒 -候 -借 -倣 -値 -倫 -倭 -倶 -倹 -偃 -假 -偈 -偉 -偏 -偐 -偕 -停 -健 -側 -偵 -偶 -偽 -傀 -傅 -傍 -傑 -傘 -備 -催 -傭 -傲 -傳 -債 -傷 -傾 -僊 -働 -像 -僑 -僕 -僚 -僧 -僭 -僮 -儀 -億 -儇 -儒 -儛 -償 -儡 -優 -儲 -儺 -儼 -兀 -允 -元 -兄 -充 -兆 -先 -光 -克 -兌 -免 -兎 -児 -党 -兜 -入 -全 -八 -公 -六 -共 -兵 -其 -具 -典 -兼 -内 -円 -冊 -再 -冑 -冒 -冗 -写 -冠 -冤 -冥 -冨 -冬 -冲 -决 -冶 -冷 -准 -凉 -凋 -凌 -凍 -凛 -凝 -凞 -几 -凡 -処 -凪 -凰 -凱 -凶 -凸 -凹 -出 -函 -刀 -刃 -分 -切 -刈 -刊 -刎 -刑 -列 -初 -判 -別 -利 -刪 -到 -制 -刷 -券 -刹 -刺 -刻 -剃 -則 -削 -剋 -前 -剖 -剛 -剣 -剤 -剥 -剪 -副 -剰 -割 -創 -剽 -劇 -劉 -劔 -力 -功 -加 -劣 -助 -努 -劫 -劭 -励 -労 -効 -劾 -勃 -勅 -勇 -勉 -勒 -動 -勘 -務 -勝 -募 -勢 -勤 -勧 -勲 -勺 -勾 -勿 -匁 -匂 -包 -匏 -化 -北 -匙 -匝 -匠 -匡 -匣 -匯 -匲 -匹 -区 -医 -匿 -十 -千 -升 -午 -卉 -半 -卍 -卑 -卒 -卓 -協 -南 -単 -博 -卜 -占 -卦 -卯 -印 -危 -即 -却 -卵 -卸 -卿 -厄 -厚 -原 -厠 -厨 -厩 -厭 -厳 -去 -参 -又 -叉 -及 -友 -双 -反 -収 -叔 -取 -受 -叙 -叛 -叟 -叡 -叢 -口 -古 -句 -叩 -只 -叫 -召 -可 -台 -叱 -史 -右 -叶 -号 -司 -吃 -各 -合 -吉 -吊 -同 -名 -后 -吏 -吐 -向 -君 -吝 -吟 -吠 -否 -含 -吸 -吹 -吻 -吽 -吾 -呂 -呆 -呈 -呉 -告 -呑 -周 -呪 -呰 -味 -呼 -命 -咀 -咄 -咋 -和 -咒 -咫 -咲 -咳 -咸 -哀 -品 -哇 -哉 -員 -哨 -哩 -哭 -哲 -哺 -唄 -唆 -唇 -唐 -唖 -唯 -唱 -唳 -唸 -唾 -啄 -商 -問 -啓 -啼 -善 -喋 -喚 -喜 -喝 -喧 -喩 -喪 -喫 -喬 -單 -喰 -営 -嗅 -嗇 -嗔 -嗚 -嗜 -嗣 -嘆 -嘉 -嘗 -嘘 -嘩 -嘯 -嘱 -嘲 -嘴 -噂 -噌 -噛 -器 -噴 -噺 -嚆 -嚢 -囀 -囃 -囉 -囚 -四 -回 -因 -団 -困 -囲 -図 -固 -国 -圀 -圃 -國 -圏 -園 -圓 -團 -圜 -土 -圧 -在 -圭 -地 -址 -坂 -均 -坊 -坐 -坑 -坡 -坤 -坦 -坪 -垂 -型 -垢 -垣 -埃 -埋 -城 -埒 -埔 -域 -埠 -埴 -埵 -執 -培 -基 -埼 -堀 -堂 -堅 -堆 -堕 -堤 -堪 -堯 -堰 -報 -場 -堵 -堺 -塀 -塁 -塊 -塑 -塔 -塗 -塘 -塙 -塚 -塞 -塩 -填 -塵 -塾 -境 -墉 -墓 -増 -墜 -墟 -墨 -墳 -墺 -墻 -墾 -壁 -壇 -壊 -壌 -壕 -士 -壬 -壮 -声 -壱 -売 -壷 -壹 -壺 -壽 -変 -夏 -夕 -外 -夙 -多 -夜 -夢 -夥 -大 -天 -太 -夫 -夬 -夭 -央 -失 -夷 -夾 -奄 -奇 -奈 -奉 -奎 -奏 -契 -奔 -奕 -套 -奘 -奠 -奢 -奥 -奨 -奪 -奮 -女 -奴 -奸 -好 -如 -妃 -妄 -妊 -妍 -妓 -妖 -妙 -妥 -妨 -妬 -妲 -妹 -妻 -妾 -姉 -始 -姐 -姓 -委 -姚 -姜 -姞 -姥 -姦 -姨 -姪 -姫 -姶 -姻 -姿 -威 -娑 -娘 -娟 -娠 -娩 -娯 -娼 -婆 -婉 -婚 -婢 -婦 -婬 -婿 -媄 -媒 -媓 -媚 -媛 -媞 -媽 -嫁 -嫄 -嫉 -嫌 -嫐 -嫗 -嫡 -嬉 -嬌 -嬢 -嬪 -嬬 -嬾 -孁 -子 -孔 -字 -存 -孚 -孝 -孟 -季 -孤 -学 -孫 -孵 -學 -宅 -宇 -守 -安 -宋 -完 -宍 -宏 -宕 -宗 -官 -宙 -定 -宛 -宜 -宝 -実 -客 -宣 -室 -宥 -宮 -宰 -害 -宴 -宵 -家 -宸 -容 -宿 -寂 -寄 -寅 -密 -寇 -富 -寒 -寓 -寔 -寛 -寝 -察 -寡 -實 -寧 -審 -寮 -寵 -寶 -寸 -寺 -対 -寿 -封 -専 -射 -将 -尉 -尊 -尋 -對 -導 -小 -少 -尖 -尚 -尤 -尪 -尭 -就 -尹 -尺 -尻 -尼 -尽 -尾 -尿 -局 -居 -屈 -届 -屋 -屍 -屎 -屏 -屑 -屓 -展 -属 -屠 -層 -履 -屯 -山 -岐 -岑 -岡 -岩 -岫 -岬 -岳 -岷 -岸 -峠 -峡 -峨 -峯 -峰 -島 -峻 -崇 -崋 -崎 -崑 -崖 -崗 -崛 -崩 -嵌 -嵐 -嵩 -嵯 -嶂 -嶋 -嶠 -嶺 -嶼 -嶽 -巀 -巌 -巒 -巖 -川 -州 -巡 -巣 -工 -左 -巧 -巨 -巫 -差 -己 -巳 -巴 -巷 -巻 -巽 -巾 -市 -布 -帆 -希 -帖 -帚 -帛 -帝 -帥 -師 -席 -帯 -帰 -帳 -帷 -常 -帽 -幄 -幅 -幇 -幌 -幔 -幕 -幟 -幡 -幢 -幣 -干 -平 -年 -并 -幸 -幹 -幻 -幼 -幽 -幾 -庁 -広 -庄 -庇 -床 -序 -底 -庖 -店 -庚 -府 -度 -座 -庫 -庭 -庵 -庶 -康 -庸 -廂 -廃 -廉 -廊 -廓 -廟 -廠 -廣 -廬 -延 -廷 -建 -廻 -廼 -廿 -弁 -弄 -弉 -弊 -弌 -式 -弐 -弓 -弔 -引 -弖 -弗 -弘 -弛 -弟 -弥 -弦 -弧 -弱 -張 -強 -弼 -弾 -彈 -彊 -彌 -彎 -当 -彗 -彙 -彝 -形 -彦 -彩 -彫 -彬 -彭 -彰 -影 -彷 -役 -彼 -往 -征 -徂 -径 -待 -律 -後 -徐 -徑 -徒 -従 -得 -徠 -御 -徧 -徨 -復 -循 -徭 -微 -徳 -徴 -德 -徹 -徽 -心 -必 -忉 -忌 -忍 -志 -忘 -忙 -応 -忠 -快 -忯 -念 -忻 -忽 -忿 -怒 -怖 -思 -怠 -怡 -急 -性 -怨 -怪 -怯 -恂 -恋 -恐 -恒 -恕 -恣 -恤 -恥 -恨 -恩 -恬 -恭 -息 -恵 -悉 -悌 -悍 -悔 -悟 -悠 -患 -悦 -悩 -悪 -悲 -悼 -情 -惇 -惑 -惚 -惜 -惟 -惠 -惣 -惧 -惨 -惰 -想 -惹 -惺 -愈 -愉 -愍 -意 -愔 -愚 -愛 -感 -愷 -愿 -慈 -態 -慌 -慎 -慕 -慢 -慣 -慧 -慨 -慮 -慰 -慶 -憂 -憎 -憐 -憑 -憙 -憤 -憧 -憩 -憬 -憲 -憶 -憾 -懇 -應 -懌 -懐 -懲 -懸 -懺 -懽 -懿 -戈 -戊 -戌 -戎 -成 -我 -戒 -戔 -或 -戚 -戟 -戦 -截 -戮 -戯 -戴 -戸 -戻 -房 -所 -扁 -扇 -扈 -扉 -手 -才 -打 -払 -托 -扮 -扱 -扶 -批 -承 -技 -抄 -把 -抑 -抓 -投 -抗 -折 -抜 -択 -披 -抱 -抵 -抹 -押 -抽 -担 -拇 -拈 -拉 -拍 -拏 -拐 -拒 -拓 -拘 -拙 -招 -拝 -拠 -拡 -括 -拭 -拳 -拵 -拶 -拾 -拿 -持 -挂 -指 -按 -挑 -挙 -挟 -挨 -振 -挺 -挽 -挿 -捉 -捕 -捗 -捜 -捧 -捨 -据 -捺 -捻 -掃 -掄 -授 -掌 -排 -掖 -掘 -掛 -掟 -採 -探 -掣 -接 -控 -推 -掩 -措 -掬 -掲 -掴 -掻 -掾 -揃 -揄 -揆 -揉 -描 -提 -揖 -揚 -換 -握 -揮 -援 -揶 -揺 -損 -搦 -搬 -搭 -携 -搾 -摂 -摘 -摩 -摸 -摺 -撃 -撒 -撞 -撤 -撥 -撫 -播 -撮 -撰 -撲 -撹 -擁 -操 -擔 -擦 -擬 -擾 -攘 -攝 -攣 -支 -收 -改 -攻 -放 -政 -故 -敏 -救 -敗 -教 -敢 -散 -敦 -敬 -数 -整 -敵 -敷 -斂 -文 -斉 -斎 -斐 -斑 -斗 -料 -斜 -斟 -斤 -斥 -斧 -斬 -断 -斯 -新 -方 -於 -施 -旁 -旅 -旋 -旌 -族 -旗 -旛 -无 -旡 -既 -日 -旦 -旧 -旨 -早 -旬 -旭 -旺 -旻 -昂 -昆 -昇 -昉 -昌 -明 -昏 -易 -昔 -星 -映 -春 -昧 -昨 -昪 -昭 -是 -昵 -昼 -晁 -時 -晃 -晋 -晏 -晒 -晟 -晦 -晧 -晩 -普 -景 -晴 -晶 -智 -暁 -暇 -暈 -暉 -暑 -暖 -暗 -暘 -暢 -暦 -暫 -暮 -暲 -暴 -暹 -暾 -曄 -曇 -曉 -曖 -曙 -曜 -曝 -曠 -曰 -曲 -曳 -更 -書 -曹 -曼 -曽 -曾 -替 -最 -會 -月 -有 -朋 -服 -朏 -朔 -朕 -朗 -望 -朝 -期 -朧 -木 -未 -末 -本 -札 -朱 -朴 -机 -朽 -杁 -杉 -李 -杏 -材 -村 -杓 -杖 -杜 -杞 -束 -条 -杢 -杣 -来 -杭 -杮 -杯 -東 -杲 -杵 -杷 -杼 -松 -板 -枅 -枇 -析 -枓 -枕 -林 -枚 -果 -枝 -枠 -枡 -枢 -枯 -枳 -架 -柄 -柊 -柏 -某 -柑 -染 -柔 -柘 -柚 -柯 -柱 -柳 -柴 -柵 -査 -柾 -柿 -栂 -栃 -栄 -栖 -栗 -校 -株 -栲 -栴 -核 -根 -栻 -格 -栽 -桁 -桂 -桃 -框 -案 -桐 -桑 -桓 -桔 -桜 -桝 -桟 -桧 -桴 -桶 -桾 -梁 -梅 -梆 -梓 -梔 -梗 -梛 -條 -梟 -梢 -梧 -梨 -械 -梱 -梲 -梵 -梶 -棄 -棋 -棒 -棗 -棘 -棚 -棟 -棠 -森 -棲 -棹 -棺 -椀 -椅 -椋 -植 -椎 -椏 -椒 -椙 -検 -椥 -椹 -椿 -楊 -楓 -楕 -楚 -楞 -楠 -楡 -楢 -楨 -楪 -楫 -業 -楮 -楯 -楳 -極 -楷 -楼 -楽 -概 -榊 -榎 -榕 -榛 -榜 -榮 -榱 -榴 -槃 -槇 -槊 -構 -槌 -槍 -槐 -様 -槙 -槻 -槽 -槿 -樂 -樋 -樓 -樗 -標 -樟 -模 -権 -横 -樫 -樵 -樹 -樺 -樽 -橇 -橋 -橘 -機 -橿 -檀 -檄 -檎 -檐 -檗 -檜 -檣 -檥 -檬 -檮 -檸 -檻 -櫃 -櫓 -櫛 -櫟 -櫨 -櫻 -欄 -欅 -欠 -次 -欣 -欧 -欲 -欺 -欽 -款 -歌 -歎 -歓 -止 -正 -此 -武 -歩 -歪 -歯 -歳 -歴 -死 -殆 -殉 -殊 -残 -殖 -殯 -殴 -段 -殷 -殺 -殻 -殿 -毀 -毅 -母 -毎 -毒 -比 -毘 -毛 -毫 -毬 -氈 -氏 -民 -気 -水 -氷 -永 -氾 -汀 -汁 -求 -汎 -汐 -汗 -汚 -汝 -江 -池 -汪 -汰 -汲 -決 -汽 -沂 -沃 -沅 -沆 -沈 -沌 -沐 -沓 -沖 -沙 -没 -沢 -沱 -河 -沸 -油 -治 -沼 -沽 -沿 -況 -泉 -泊 -泌 -法 -泗 -泡 -波 -泣 -泥 -注 -泯 -泰 -泳 -洋 -洒 -洗 -洛 -洞 -津 -洩 -洪 -洲 -洸 -洹 -活 -洽 -派 -流 -浄 -浅 -浙 -浚 -浜 -浣 -浦 -浩 -浪 -浮 -浴 -海 -浸 -涅 -消 -涌 -涙 -涛 -涯 -液 -涵 -涼 -淀 -淄 -淆 -淇 -淋 -淑 -淘 -淡 -淤 -淨 -淫 -深 -淳 -淵 -混 -淹 -添 -清 -済 -渉 -渋 -渓 -渕 -渚 -減 -渟 -渠 -渡 -渤 -渥 -渦 -温 -渫 -測 -港 -游 -渾 -湊 -湖 -湘 -湛 -湧 -湫 -湯 -湾 -湿 -満 -源 -準 -溜 -溝 -溢 -溥 -溪 -溶 -溺 -滄 -滅 -滋 -滌 -滑 -滕 -滝 -滞 -滴 -滸 -滹 -滿 -漁 -漂 -漆 -漉 -漏 -漑 -演 -漕 -漠 -漢 -漣 -漫 -漬 -漱 -漸 -漿 -潅 -潔 -潙 -潜 -潟 -潤 -潭 -潮 -潰 -潴 -澁 -澂 -澄 -澎 -澗 -澤 -澪 -澱 -澳 -激 -濁 -濃 -濟 -濠 -濡 -濤 -濫 -濯 -濱 -濾 -瀉 -瀋 -瀑 -瀕 -瀞 -瀟 -瀧 -瀬 -瀾 -灌 -灑 -灘 -火 -灯 -灰 -灸 -災 -炉 -炊 -炎 -炒 -炭 -炮 -炷 -点 -為 -烈 -烏 -烙 -烝 -烹 -焔 -焙 -焚 -無 -焦 -然 -焼 -煇 -煉 -煌 -煎 -煕 -煙 -煤 -煥 -照 -煩 -煬 -煮 -煽 -熈 -熊 -熙 -熟 -熨 -熱 -熹 -熾 -燃 -燈 -燎 -燔 -燕 -燗 -燥 -燭 -燻 -爆 -爐 -爪 -爬 -爲 -爵 -父 -爺 -爼 -爽 -爾 -片 -版 -牌 -牒 -牘 -牙 -牛 -牝 -牟 -牡 -牢 -牧 -物 -牲 -特 -牽 -犂 -犠 -犬 -犯 -状 -狂 -狄 -狐 -狗 -狙 -狛 -狡 -狩 -独 -狭 -狷 -狸 -狼 -猊 -猛 -猟 -猥 -猨 -猩 -猪 -猫 -献 -猴 -猶 -猷 -猾 -猿 -獄 -獅 -獏 -獣 -獲 -玄 -玅 -率 -玉 -王 -玖 -玩 -玲 -珀 -珂 -珈 -珉 -珊 -珍 -珎 -珞 -珠 -珣 -珥 -珪 -班 -現 -球 -理 -琉 -琢 -琥 -琦 -琮 -琲 -琳 -琴 -琵 -琶 -瑁 -瑋 -瑙 -瑚 -瑛 -瑜 -瑞 -瑠 -瑤 -瑩 -瑪 -瑳 -瑾 -璃 -璋 -璜 -璞 -璧 -璨 -環 -璵 -璽 -璿 -瓊 -瓔 -瓜 -瓢 -瓦 -瓶 -甍 -甑 -甕 -甘 -甚 -甞 -生 -産 -甥 -用 -甫 -田 -由 -甲 -申 -男 -町 -画 -界 -畏 -畑 -畔 -留 -畜 -畝 -畠 -畢 -略 -番 -異 -畳 -當 -畷 -畸 -畺 -畿 -疆 -疇 -疋 -疎 -疏 -疑 -疫 -疱 -疲 -疹 -疼 -疾 -病 -症 -痒 -痔 -痕 -痘 -痙 -痛 -痢 -痩 -痴 -痺 -瘍 -瘡 -瘧 -療 -癇 -癌 -癒 -癖 -癡 -癪 -発 -登 -白 -百 -的 -皆 -皇 -皋 -皐 -皓 -皮 -皺 -皿 -盂 -盃 -盆 -盈 -益 -盒 -盗 -盛 -盞 -盟 -盡 -監 -盤 -盥 -盧 -目 -盲 -直 -相 -盾 -省 -眉 -看 -県 -眞 -真 -眠 -眷 -眺 -眼 -着 -睡 -督 -睦 -睨 -睿 -瞋 -瞑 -瞞 -瞬 -瞭 -瞰 -瞳 -瞻 -瞼 -瞿 -矍 -矛 -矜 -矢 -知 -矧 -矩 -短 -矮 -矯 -石 -砂 -砌 -研 -砕 -砥 -砦 -砧 -砲 -破 -砺 -硝 -硫 -硬 -硯 -碁 -碇 -碌 -碑 -碓 -碕 -碗 -碣 -碧 -碩 -確 -碾 -磁 -磐 -磔 -磧 -磨 -磬 -磯 -礁 -礎 -礒 -礙 -礫 -礬 -示 -礼 -社 -祀 -祁 -祇 -祈 -祉 -祐 -祓 -祕 -祖 -祗 -祚 -祝 -神 -祟 -祠 -祢 -祥 -票 -祭 -祷 -祺 -禁 -禄 -禅 -禊 -禍 -禎 -福 -禔 -禖 -禛 -禦 -禧 -禮 -禰 -禹 -禽 -禿 -秀 -私 -秋 -科 -秒 -秘 -租 -秤 -秦 -秩 -称 -移 -稀 -程 -税 -稔 -稗 -稙 -稚 -稜 -稠 -種 -稱 -稲 -稷 -稻 -稼 -稽 -稿 -穀 -穂 -穆 -積 -穎 -穏 -穗 -穜 -穢 -穣 -穫 -穴 -究 -空 -突 -窃 -窄 -窒 -窓 -窟 -窠 -窩 -窪 -窮 -窯 -竃 -竄 -竈 -立 -站 -竜 -竝 -竟 -章 -童 -竪 -竭 -端 -竴 -競 -竹 -竺 -竽 -竿 -笄 -笈 -笏 -笑 -笙 -笛 -笞 -笠 -笥 -符 -第 -笹 -筅 -筆 -筇 -筈 -等 -筋 -筌 -筍 -筏 -筐 -筑 -筒 -答 -策 -筝 -筥 -筧 -筬 -筮 -筯 -筰 -筵 -箆 -箇 -箋 -箏 -箒 -箔 -箕 -算 -箙 -箜 -管 -箪 -箭 -箱 -箸 -節 -篁 -範 -篆 -篇 -築 -篋 -篌 -篝 -篠 -篤 -篥 -篦 -篩 -篭 -篳 -篷 -簀 -簒 -簡 -簧 -簪 -簫 -簺 -簾 -簿 -籀 -籃 -籌 -籍 -籐 -籟 -籠 -籤 -籬 -米 -籾 -粂 -粉 -粋 -粒 -粕 -粗 -粘 -粛 -粟 -粥 -粧 -粮 -粳 -精 -糊 -糖 -糜 -糞 -糟 -糠 -糧 -糯 -糸 -糺 -系 -糾 -紀 -約 -紅 -紋 -納 -紐 -純 -紗 -紘 -紙 -級 -紛 -素 -紡 -索 -紫 -紬 -累 -細 -紳 -紵 -紹 -紺 -絁 -終 -絃 -組 -絅 -経 -結 -絖 -絞 -絡 -絣 -給 -統 -絲 -絵 -絶 -絹 -絽 -綏 -經 -継 -続 -綜 -綟 -綬 -維 -綱 -網 -綴 -綸 -綺 -綽 -綾 -綿 -緊 -緋 -総 -緑 -緒 -線 -締 -緥 -編 -緩 -緬 -緯 -練 -緻 -縁 -縄 -縅 -縒 -縛 -縞 -縢 -縣 -縦 -縫 -縮 -縹 -總 -績 -繁 -繊 -繋 -繍 -織 -繕 -繝 -繦 -繧 -繰 -繹 -繼 -纂 -纈 -纏 -纐 -纒 -纛 -缶 -罔 -罠 -罧 -罪 -置 -罰 -署 -罵 -罷 -罹 -羂 -羅 -羆 -羇 -羈 -羊 -羌 -美 -群 -羨 -義 -羯 -羲 -羹 -羽 -翁 -翅 -翌 -習 -翔 -翛 -翠 -翡 -翫 -翰 -翺 -翻 -翼 -耀 -老 -考 -者 -耆 -而 -耐 -耕 -耗 -耨 -耳 -耶 -耽 -聊 -聖 -聘 -聚 -聞 -聟 -聡 -聨 -聯 -聰 -聲 -聴 -職 -聾 -肄 -肆 -肇 -肉 -肋 -肌 -肖 -肘 -肛 -肝 -股 -肢 -肥 -肩 -肪 -肯 -肱 -育 -肴 -肺 -胃 -胆 -背 -胎 -胖 -胚 -胝 -胞 -胡 -胤 -胱 -胴 -胸 -能 -脂 -脅 -脆 -脇 -脈 -脊 -脚 -脛 -脩 -脱 -脳 -腋 -腎 -腐 -腑 -腔 -腕 -腫 -腰 -腱 -腸 -腹 -腺 -腿 -膀 -膏 -膚 -膜 -膝 -膠 -膣 -膨 -膩 -膳 -膵 -膾 -膿 -臂 -臆 -臈 -臍 -臓 -臘 -臚 -臣 -臥 -臨 -自 -臭 -至 -致 -臺 -臼 -舂 -舅 -與 -興 -舌 -舍 -舎 -舒 -舖 -舗 -舘 -舜 -舞 -舟 -舩 -航 -般 -舳 -舶 -船 -艇 -艘 -艦 -艮 -良 -色 -艶 -芋 -芒 -芙 -芝 -芥 -芦 -芬 -芭 -芯 -花 -芳 -芸 -芹 -芻 -芽 -芿 -苅 -苑 -苔 -苗 -苛 -苞 -苡 -若 -苦 -苧 -苫 -英 -苴 -苻 -茂 -范 -茄 -茅 -茎 -茗 -茘 -茜 -茨 -茲 -茵 -茶 -茸 -茹 -草 -荊 -荏 -荒 -荘 -荷 -荻 -荼 -莞 -莪 -莫 -莬 -莱 -莵 -莽 -菅 -菊 -菌 -菓 -菖 -菘 -菜 -菟 -菩 -菫 -華 -菱 -菴 -萄 -萊 -萌 -萍 -萎 -萠 -萩 -萬 -萱 -落 -葉 -著 -葛 -葡 -董 -葦 -葩 -葬 -葭 -葱 -葵 -葺 -蒋 -蒐 -蒔 -蒙 -蒟 -蒡 -蒲 -蒸 -蒻 -蒼 -蒿 -蓄 -蓆 -蓉 -蓋 -蓑 -蓬 -蓮 -蓼 -蔀 -蔑 -蔓 -蔚 -蔡 -蔦 -蔬 -蔭 -蔵 -蔽 -蕃 -蕉 -蕊 -蕎 -蕨 -蕩 -蕪 -蕭 -蕾 -薄 -薇 -薊 -薔 -薗 -薙 -薛 -薦 -薨 -薩 -薪 -薫 -薬 -薭 -薮 -藁 -藉 -藍 -藏 -藐 -藝 -藤 -藩 -藪 -藷 -藹 -藺 -藻 -蘂 -蘆 -蘇 -蘊 -蘭 -虎 -虐 -虔 -虚 -虜 -虞 -號 -虫 -虹 -虻 -蚊 -蚕 -蛇 -蛉 -蛍 -蛎 -蛙 -蛛 -蛟 -蛤 -蛭 -蛮 -蛸 -蛹 -蛾 -蜀 -蜂 -蜃 -蜆 -蜊 -蜘 -蜜 -蜷 -蜻 -蝉 -蝋 -蝕 -蝙 -蝠 -蝦 -蝶 -蝿 -螂 -融 -螣 -螺 -蟄 -蟇 -蟠 -蟷 -蟹 -蟻 -蠢 -蠣 -血 -衆 -行 -衍 -衒 -術 -街 -衙 -衛 -衝 -衞 -衡 -衢 -衣 -表 -衫 -衰 -衵 -衷 -衽 -衾 -衿 -袁 -袈 -袋 -袍 -袒 -袖 -袙 -袞 -袢 -被 -袰 -袱 -袴 -袷 -袿 -裁 -裂 -裃 -装 -裏 -裔 -裕 -裘 -裙 -補 -裟 -裡 -裲 -裳 -裴 -裸 -裹 -製 -裾 -褂 -褄 -複 -褌 -褐 -褒 -褥 -褪 -褶 -褻 -襄 -襖 -襞 -襟 -襠 -襦 -襪 -襲 -襴 -襷 -西 -要 -覆 -覇 -覈 -見 -規 -視 -覗 -覚 -覧 -親 -覲 -観 -覺 -觀 -角 -解 -触 -言 -訂 -計 -討 -訓 -託 -記 -訛 -訟 -訢 -訥 -訪 -設 -許 -訳 -訴 -訶 -診 -註 -証 -詐 -詔 -評 -詛 -詞 -詠 -詢 -詣 -試 -詩 -詫 -詮 -詰 -話 -該 -詳 -誄 -誅 -誇 -誉 -誌 -認 -誓 -誕 -誘 -語 -誠 -誡 -誣 -誤 -誥 -誦 -説 -読 -誰 -課 -誼 -誾 -調 -談 -請 -諌 -諍 -諏 -諒 -論 -諚 -諜 -諟 -諡 -諦 -諧 -諫 -諭 -諮 -諱 -諶 -諷 -諸 -諺 -諾 -謀 -謄 -謌 -謎 -謗 -謙 -謚 -講 -謝 -謡 -謫 -謬 -謹 -證 -識 -譚 -譛 -譜 -警 -譬 -譯 -議 -譲 -譴 -護 -讀 -讃 -讐 -讒 -谷 -谿 -豅 -豆 -豊 -豎 -豐 -豚 -象 -豪 -豫 -豹 -貌 -貝 -貞 -負 -財 -貢 -貧 -貨 -販 -貪 -貫 -責 -貯 -貰 -貴 -買 -貸 -費 -貼 -貿 -賀 -賁 -賂 -賃 -賄 -資 -賈 -賊 -賎 -賑 -賓 -賛 -賜 -賞 -賠 -賢 -賣 -賤 -賦 -質 -賭 -購 -賽 -贄 -贅 -贈 -贋 -贔 -贖 -赤 -赦 -走 -赴 -起 -超 -越 -趙 -趣 -足 -趺 -趾 -跋 -跏 -距 -跡 -跨 -跪 -路 -跳 -践 -踊 -踏 -踐 -踞 -踪 -踵 -蹄 -蹉 -蹊 -蹟 -蹲 -蹴 -躅 -躇 -躊 -躍 -躑 -躙 -躪 -身 -躬 -躯 -躰 -車 -軋 -軌 -軍 -軒 -軟 -転 -軸 -軻 -軽 -軾 -較 -載 -輌 -輔 -輜 -輝 -輦 -輩 -輪 -輯 -輸 -輿 -轄 -轍 -轟 -轢 -辛 -辞 -辟 -辥 -辦 -辨 -辰 -辱 -農 -辺 -辻 -込 -迂 -迅 -迎 -近 -返 -迢 -迦 -迪 -迫 -迭 -述 -迷 -迹 -追 -退 -送 -逃 -逅 -逆 -逍 -透 -逐 -逓 -途 -逕 -逗 -這 -通 -逝 -逞 -速 -造 -逢 -連 -逮 -週 -進 -逸 -逼 -遁 -遂 -遅 -遇 -遊 -運 -遍 -過 -遐 -道 -達 -違 -遙 -遜 -遠 -遡 -遣 -遥 -適 -遭 -遮 -遯 -遵 -遷 -選 -遺 -遼 -避 -邀 -邁 -邂 -邃 -還 -邇 -邉 -邊 -邑 -那 -邦 -邨 -邪 -邯 -邵 -邸 -郁 -郊 -郎 -郡 -郢 -部 -郭 -郴 -郵 -郷 -都 -鄂 -鄙 -鄭 -鄰 -鄲 -酉 -酋 -酌 -配 -酎 -酒 -酔 -酢 -酥 -酪 -酬 -酵 -酷 -酸 -醍 -醐 -醒 -醗 -醜 -醤 -醪 -醵 -醸 -采 -釈 -釉 -釋 -里 -重 -野 -量 -釐 -金 -釘 -釜 -針 -釣 -釧 -釿 -鈍 -鈎 -鈐 -鈔 -鈞 -鈦 -鈴 -鈷 -鈸 -鈿 -鉄 -鉇 -鉉 -鉋 -鉛 -鉢 -鉤 -鉦 -鉱 -鉾 -銀 -銃 -銅 -銈 -銑 -銕 -銘 -銚 -銜 -銭 -鋏 -鋒 -鋤 -鋭 -鋲 -鋳 -鋸 -鋺 -鋼 -錆 -錍 -錐 -錘 -錠 -錣 -錦 -錫 -錬 -錯 -録 -錵 -鍋 -鍍 -鍑 -鍔 -鍛 -鍬 -鍮 -鍵 -鍼 -鍾 -鎌 -鎖 -鎗 -鎚 -鎧 -鎬 -鎮 -鎰 -鎹 -鏃 -鏑 -鏡 -鐃 -鐇 -鐐 -鐔 -鐘 -鐙 -鐚 -鐡 -鐵 -鐸 -鑁 -鑊 -鑑 -鑒 -鑚 -鑠 -鑢 -鑰 -鑵 -鑷 -鑼 -鑽 -鑿 -長 -門 -閃 -閇 -閉 -開 -閏 -閑 -間 -閔 -閘 -関 -閣 -閤 -閥 -閦 -閨 -閬 -閲 -閻 -閼 -閾 -闇 -闍 -闔 -闕 -闘 -關 -闡 -闢 -闥 -阜 -阪 -阮 -阯 -防 -阻 -阿 -陀 -陂 -附 -陌 -降 -限 -陛 -陞 -院 -陣 -除 -陥 -陪 -陬 -陰 -陳 -陵 -陶 -陸 -険 -陽 -隅 -隆 -隈 -隊 -隋 -階 -随 -隔 -際 -障 -隠 -隣 -隧 -隷 -隻 -隼 -雀 -雁 -雄 -雅 -集 -雇 -雉 -雊 -雋 -雌 -雍 -雑 -雖 -雙 -雛 -離 -難 -雨 -雪 -雫 -雰 -雲 -零 -雷 -雹 -電 -需 -震 -霊 -霍 -霖 -霜 -霞 -霧 -霰 -露 -靈 -青 -靖 -静 -靜 -非 -面 -革 -靫 -靭 -靱 -靴 -靺 -鞁 -鞄 -鞆 -鞋 -鞍 -鞏 -鞘 -鞠 -鞨 -鞭 -韋 -韓 -韜 -韮 -音 -韶 -韻 -響 -頁 -頂 -頃 -項 -順 -須 -頌 -預 -頑 -頒 -頓 -領 -頚 -頬 -頭 -頴 -頸 -頻 -頼 -顆 -題 -額 -顎 -顔 -顕 -顗 -願 -顛 -類 -顧 -顯 -風 -飛 -食 -飢 -飩 -飫 -飯 -飲 -飴 -飼 -飽 -飾 -餃 -餅 -餉 -養 -餌 -餐 -餓 -餘 -餝 -餡 -館 -饂 -饅 -饉 -饋 -饌 -饒 -饗 -首 -馗 -香 -馨 -馬 -馳 -馴 -駄 -駅 -駆 -駈 -駐 -駒 -駕 -駝 -駿 -騁 -騎 -騏 -騒 -験 -騙 -騨 -騰 -驕 -驚 -驛 -驢 -骨 -骸 -髄 -體 -高 -髙 -髢 -髪 -髭 -髮 -髷 -髻 -鬘 -鬚 -鬢 -鬨 -鬯 -鬱 -鬼 -魁 -魂 -魄 -魅 -魏 -魔 -魚 -魯 -鮎 -鮑 -鮒 -鮪 -鮫 -鮭 -鮮 -鯉 -鯔 -鯖 -鯛 -鯨 -鯰 -鯱 -鰐 -鰒 -鰭 -鰯 -鰰 -鰹 -鰻 -鱈 -鱒 -鱗 -鱧 -鳥 -鳩 -鳰 -鳳 -鳴 -鳶 -鴈 -鴉 -鴎 -鴛 -鴟 -鴦 -鴨 -鴫 -鴻 -鵄 -鵜 -鵞 -鵡 -鵬 -鵲 -鵺 -鶉 -鶏 -鶯 -鶴 -鷄 -鷙 -鷲 -鷹 -鷺 -鸚 -鸞 -鹸 -鹽 -鹿 -麁 -麒 -麓 -麗 -麝 -麞 -麟 -麦 -麩 -麹 -麺 -麻 -麾 -麿 -黄 -黌 -黍 -黒 -黙 -黛 -黠 -鼈 -鼉 -鼎 -鼓 -鼠 -鼻 -齊 -齋 -齟 -齢 -齬 -龍 -龕 -龗 -! -# -% -& -( -) -+ -, -- -. -/ -0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -: -; -= -? -@ -A -B -C -D -E -F -G -H -I -J -K -L -M -N -O -P -R -S -T -U -V -W -X -Z -a -c -d -e -f -h -i -j -k -l -m -n -o -p -r -s -t -u -y -z -~ -・ - diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py deleted file mode 100644 index 21dbbd9dc790e3d009f45c1ef1b68c001e9f0e0b..0000000000000000000000000000000000000000 --- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py +++ /dev/null @@ -1,213 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import string - -class CharacterOps(object): - """ Convert between text-label and text-index """ - - def __init__(self, config): - self.character_type = config['character_type'] - self.loss_type = config['loss_type'] - self.max_text_len = config['max_text_length'] - if self.character_type == "en": - self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" - dict_character = list(self.character_str) - elif self.character_type in [ - "ch", 'japan', 'korean', 'french', 'german' - ]: - character_dict_path = config['character_dict_path'] - add_space = False - if 'use_space_char' in config: - add_space = config['use_space_char'] - self.character_str = "" - with open(character_dict_path, "rb") as fin: - lines = fin.readlines() - for line in lines: - line = line.decode('utf-8').strip("\n").strip("\r\n") - self.character_str += line - if add_space: - self.character_str += " " - dict_character = list(self.character_str) - elif self.character_type == "en_sensitive": - # same with ASTER setting (use 94 char). - self.character_str = string.printable[:-6] - dict_character = list(self.character_str) - else: - self.character_str = None - assert self.character_str is not None, \ - "Nonsupport type of the character: {}".format(self.character_str) - self.beg_str = "sos" - self.end_str = "eos" - if self.loss_type == "attention": - dict_character = [self.beg_str, self.end_str] + dict_character - elif self.loss_type == "srn": - dict_character = dict_character + [self.beg_str, self.end_str] - self.dict = {} - for i, char in enumerate(dict_character): - self.dict[char] = i - self.character = dict_character - - def encode(self, text): - """convert text-label into text-index. - input: - text: text labels of each image. [batch_size] - - output: - text: concatenated text index for CTCLoss. - [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)] - length: length of each text. [batch_size] - """ - if self.character_type == "en": - text = text.lower() - - text_list = [] - for char in text: - if char not in self.dict: - continue - text_list.append(self.dict[char]) - text = np.array(text_list) - return text - - def decode(self, text_index, is_remove_duplicate=False): - """ convert text-index into text-label. """ - char_list = [] - char_num = self.get_char_num() - - if self.loss_type == "attention": - beg_idx = self.get_beg_end_flag_idx("beg") - end_idx = self.get_beg_end_flag_idx("end") - ignored_tokens = [beg_idx, end_idx] - else: - ignored_tokens = [char_num] - - for idx in range(len(text_index)): - if text_index[idx] in ignored_tokens: - continue - if is_remove_duplicate: - if idx > 0 and text_index[idx - 1] == text_index[idx]: - continue - char_list.append(self.character[int(text_index[idx])]) - text = ''.join(char_list) - return text - - def get_char_num(self): - return len(self.character) - - def get_beg_end_flag_idx(self, beg_or_end): - if self.loss_type == "attention": - if beg_or_end == "beg": - idx = np.array(self.dict[self.beg_str]) - elif beg_or_end == "end": - idx = np.array(self.dict[self.end_str]) - else: - assert False, "Unsupport type %s in get_beg_end_flag_idx"\ - % beg_or_end - return idx - else: - err = "error in get_beg_end_flag_idx when using the loss %s"\ - % (self.loss_type) - assert False, err - - -def cal_predicts_accuracy(char_ops, - preds, - preds_lod, - labels, - labels_lod, - is_remove_duplicate=False): - acc_num = 0 - img_num = 0 - for ino in range(len(labels_lod) - 1): - beg_no = preds_lod[ino] - end_no = preds_lod[ino + 1] - preds_text = preds[beg_no:end_no].reshape(-1) - preds_text = char_ops.decode(preds_text, is_remove_duplicate) - - beg_no = labels_lod[ino] - end_no = labels_lod[ino + 1] - labels_text = labels[beg_no:end_no].reshape(-1) - labels_text = char_ops.decode(labels_text, is_remove_duplicate) - img_num += 1 - - if preds_text == labels_text: - acc_num += 1 - acc = acc_num * 1.0 / img_num - return acc, acc_num, img_num - - -def cal_predicts_accuracy_srn(char_ops, - preds, - labels, - max_text_len, - is_debug=False): - acc_num = 0 - img_num = 0 - - char_num = char_ops.get_char_num() - - total_len = preds.shape[0] - img_num = int(total_len / max_text_len) - for i in range(img_num): - cur_label = [] - cur_pred = [] - for j in range(max_text_len): - if labels[j + i * max_text_len] != int(char_num - 1): #0 - cur_label.append(labels[j + i * max_text_len][0]) - else: - break - - for j in range(max_text_len + 1): - if j < len(cur_label) and preds[j + i * max_text_len][ - 0] != cur_label[j]: - break - elif j == len(cur_label) and j == max_text_len: - acc_num += 1 - break - elif j == len(cur_label) and preds[j + i * max_text_len][0] == int( - char_num - 1): - acc_num += 1 - break - acc = acc_num * 1.0 / img_num - return acc, acc_num, img_num - - -def convert_rec_attention_infer_res(preds): - img_num = preds.shape[0] - target_lod = [0] - convert_ids = [] - for ino in range(img_num): - end_pos = np.where(preds[ino, :] == 1)[0] - if len(end_pos) <= 1: - text_list = preds[ino, 1:] - else: - text_list = preds[ino, 1:end_pos[1]] - target_lod.append(target_lod[ino] + len(text_list)) - convert_ids = convert_ids + list(text_list) - convert_ids = np.array(convert_ids) - convert_ids = convert_ids.reshape((-1, 1)) - return convert_ids, target_lod - - -def convert_rec_label_to_lod(ori_labels): - img_num = len(ori_labels) - target_lod = [0] - convert_ids = [] - for ino in range(img_num): - target_lod.append(target_lod[ino] + len(ori_labels[ino])) - convert_ids = convert_ids + list(ori_labels[ino]) - convert_ids = np.array(convert_ids) - convert_ids = convert_ids.reshape((-1, 1)) - return convert_ids, target_lod diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py index cd04f063496af4a93459ec19a7a46b93f2dab51b..1b9f1050eeb65371d6ba6521235088608ad2c739 100644 --- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py +++ b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py @@ -1,304 +1,61 @@ -# -*- coding:utf-8 -*- -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import argparse -import ast -import copy -import math -import os -import time - -from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor -from paddlehub.common.logger import logger -from paddlehub.module.module import moduleinfo, runnable, serving -from PIL import Image -import cv2 -import numpy as np -import paddle.fluid as fluid import paddlehub as hub - -from japan_ocr_db_crnn_mobile.character import CharacterOps -from japan_ocr_db_crnn_mobile.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving @moduleinfo( name="japan_ocr_db_crnn_mobile", version="1.0.0", - summary= - "The module can recognize the japan texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization module. Then it recognizes the german texts. ", - author="paddle-dev", - author_email="paddle-dev@baidu.com", + summary="ocr service", + author="PaddlePaddle", type="cv/text_recognition") -class JapanOCRDBCRNNMobile(hub.Module): - def _initialize(self, text_detector_module=None, enable_mkldnn=False, use_angle_classification=False): +class JapanOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): """ initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence """ - self.character_dict_path = os.path.join(self.directory, 'assets', - 'japan_dict.txt') - char_ops_params = { - 'character_type': 'japan', - 'character_dict_path': self.character_dict_path, - 'loss_type': 'ctc', - 'max_text_length': 25, - 'use_space_char': True - } - self.char_ops = CharacterOps(char_ops_params) - self.rec_image_shape = [3, 32, 320] - self._text_detector_module = text_detector_module - self.font_file = os.path.join(self.directory, 'assets', 'japan.ttc') - self.enable_mkldnn = enable_mkldnn - self.use_angle_classification = use_angle_classification - - self.rec_pretrained_model_path = os.path.join( - self.directory, 'inference_model', 'character_rec') - self.rec_predictor, self.rec_input_tensor, self.rec_output_tensors = self._set_config( - self.rec_pretrained_model_path) - - if self.use_angle_classification: - self.cls_pretrained_model_path = os.path.join( - self.directory, 'inference_model', 'angle_cls') - - self.cls_predictor, self.cls_input_tensor, self.cls_output_tensors = self._set_config( - self.cls_pretrained_model_path) - - def _set_config(self, pretrained_model_path): - """ - predictor config path - """ - model_file_path = os.path.join(pretrained_model_path, 'model') - params_file_path = os.path.join(pretrained_model_path, 'params') - - config = AnalysisConfig(model_file_path, params_file_path) - try: - _places = os.environ["CUDA_VISIBLE_DEVICES"] - int(_places[0]) - use_gpu = True - except: - use_gpu = False - - if use_gpu: - config.enable_use_gpu(8000, 0) - else: - config.disable_gpu() - if self.enable_mkldnn: - # cache 10 different shapes for mkldnn to avoid memory leak - config.set_mkldnn_cache_capacity(10) - config.enable_mkldnn() - - config.disable_glog_info() - config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass") - config.switch_use_feed_fetch_ops(False) - - predictor = create_paddle_predictor(config) - - input_names = predictor.get_input_names() - input_tensor = predictor.get_input_tensor(input_names[0]) - output_names = predictor.get_output_names() - output_tensors = [] - for output_name in output_names: - output_tensor = predictor.get_output_tensor(output_name) - output_tensors.append(output_tensor) - - return predictor, input_tensor, output_tensors - - @property - def text_detector_module(self): - """ - text detect module - """ - if not self._text_detector_module: - self._text_detector_module = hub.Module( - name='chinese_text_detection_db_mobile', - enable_mkldnn=self.enable_mkldnn, - version='1.0.4') - return self._text_detector_module - - def read_images(self, paths=[]): - images = [] - for img_path in paths: - assert os.path.isfile( - img_path), "The {} isn't a valid file.".format(img_path) - img = cv2.imread(img_path) - if img is None: - logger.info("error in loading image:{}".format(img_path)) - continue - images.append(img) - return images - - def get_rotate_crop_image(self, img, points): - ''' - img_height, img_width = img.shape[0:2] - left = int(np.min(points[:, 0])) - right = int(np.max(points[:, 0])) - top = int(np.min(points[:, 1])) - bottom = int(np.max(points[:, 1])) - img_crop = img[top:bottom, left:right, :].copy() - points[:, 0] = points[:, 0] - left - points[:, 1] = points[:, 1] - top - ''' - img_crop_width = int( - max( - np.linalg.norm(points[0] - points[1]), - np.linalg.norm(points[2] - points[3]))) - img_crop_height = int( - max( - np.linalg.norm(points[0] - points[3]), - np.linalg.norm(points[1] - points[2]))) - pts_std = np.float32([[0, 0], [img_crop_width, 0], - [img_crop_width, img_crop_height], - [0, img_crop_height]]) - M = cv2.getPerspectiveTransform(points, pts_std) - dst_img = cv2.warpPerspective( - img, - M, (img_crop_width, img_crop_height), - borderMode=cv2.BORDER_REPLICATE, - flags=cv2.INTER_CUBIC) - dst_img_height, dst_img_width = dst_img.shape[0:2] - if dst_img_height * 1.0 / dst_img_width >= 1.5: - dst_img = np.rot90(dst_img) - return dst_img - - def resize_norm_img_rec(self, img, max_wh_ratio): - imgC, imgH, imgW = self.rec_image_shape - assert imgC == img.shape[2] - h, w = img.shape[:2] - ratio = w / float(h) - if math.ceil(imgH * ratio) > imgW: - resized_w = imgW - else: - resized_w = int(math.ceil(imgH * ratio)) - resized_image = cv2.resize(img, (resized_w, imgH)) - resized_image = resized_image.astype('float32') - resized_image = resized_image.transpose((2, 0, 1)) / 255 - resized_image -= 0.5 - resized_image /= 0.5 - padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) - padding_im[:, :, 0:resized_w] = resized_image - return padding_im - - def resize_norm_img_cls(self, img): - cls_image_shape = [3, 48, 192] - imgC, imgH, imgW = cls_image_shape - h = img.shape[0] - w = img.shape[1] - ratio = w / float(h) - if math.ceil(imgH * ratio) > imgW: - resized_w = imgW - else: - resized_w = int(math.ceil(imgH * ratio)) - resized_image = cv2.resize(img, (resized_w, imgH)) - resized_image = resized_image.astype('float32') - if cls_image_shape[0] == 1: - resized_image = resized_image / 255 - resized_image = resized_image[np.newaxis, :] - else: - resized_image = resized_image.transpose((2, 0, 1)) / 255 - resized_image -= 0.5 - resized_image /= 0.5 - padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) - padding_im[:, :, 0:resized_w] = resized_image - return padding_im - - def recognize_text(self, - images=[], - paths=[], - use_gpu=False, - output_dir='ocr_result', - visualization=False, - box_thresh=0.5, - text_thresh=0.5, - angle_classification_thresh=0.9): - """ - Get the chinese texts in the predicted images. + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="japan", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. Args: images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths paths (list[str]): The paths of images. If paths not images - use_gpu (bool): Whether to use gpu. - batch_size(int): the program deals once with one output_dir (str): The directory to store output images. visualization (bool): Whether to save image or not. - box_thresh(float): the threshold of the detected text box's confidence - text_thresh(float): the threshold of the chinese text recognition confidence - angle_classification_thresh(float): the threshold of the angle classification confidence - Returns: - res (list): The result of chinese texts and save path of images. + res (list): The result of text detection box and save path of images. """ - if use_gpu: - try: - _places = os.environ["CUDA_VISIBLE_DEVICES"] - int(_places[0]) - except: - raise RuntimeError( - "Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id." - ) - - self.use_gpu = use_gpu - - if images != [] and isinstance(images, list) and paths == []: - predicted_data = images - elif images == [] and isinstance(paths, list) and paths != []: - predicted_data = self.read_images(paths) - else: - raise TypeError("The input data is inconsistent with expectations.") - - assert predicted_data != [], "There is not any image to be predicted. Please check the input data." - - detection_results = self.text_detector_module.detect_text( - images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh) - print('*'*10) - print(detection_results) - - boxes = [ - np.array(item['data']).astype(np.float32) - for item in detection_results - ] - all_results = [] - for index, img_boxes in enumerate(boxes): - original_image = predicted_data[index].copy() - result = {'save_path': ''} - if img_boxes.size == 0: - result['data'] = [] - else: - img_crop_list = [] - boxes = sorted_boxes(img_boxes) - for num_box in range(len(boxes)): - tmp_box = copy.deepcopy(boxes[num_box]) - img_crop = self.get_rotate_crop_image( - original_image, tmp_box) - img_crop_list.append(img_crop) - - if self.use_angle_classification: - img_crop_list, angle_list = self._classify_text( - img_crop_list, - angle_classification_thresh=angle_classification_thresh) - - rec_results = self._recognize_text(img_crop_list) - - # if the recognized text confidence score is lower than text_thresh, then drop it - rec_res_final = [] - for index, res in enumerate(rec_results): - text, score = res - if score >= text_thresh: - rec_res_final.append({ - 'text': - text, - 'confidence': - float(score), - 'text_box_position': - boxes[index].astype(np.int).tolist() - }) - result['data'] = rec_res_final - - if visualization and result['data']: - result['save_path'] = self.save_result_image( - original_image, boxes, rec_results, output_dir, - text_thresh) - all_results.append(result) - + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) return all_results @serving @@ -310,282 +67,21 @@ class JapanOCRDBCRNNMobile(hub.Module): results = self.recognize_text(images_decode, **kwargs) return results - def save_result_image( - self, - original_image, - detection_boxes, - rec_results, - output_dir='ocr_result', - text_thresh=0.5, - ): - image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)) - txts = [item[0] for item in rec_results] - scores = [item[1] for item in rec_results] - draw_img = draw_ocr( - image, - detection_boxes, - txts, - scores, - font_file=self.font_file, - draw_txt=True, - drop_score=text_thresh) - - if not os.path.exists(output_dir): - os.makedirs(output_dir) - ext = get_image_ext(original_image) - saved_name = 'ndarray_{}{}'.format(time.time(), ext) - save_file_path = os.path.join(output_dir, saved_name) - cv2.imwrite(save_file_path, draw_img[:, :, ::-1]) - return save_file_path - - def _classify_text(self, image_list, angle_classification_thresh=0.9): - img_list = copy.deepcopy(image_list) - img_num = len(img_list) - # Calculate the aspect ratio of all text bars - width_list = [] - for img in img_list: - width_list.append(img.shape[1] / float(img.shape[0])) - # Sorting can speed up the cls process - indices = np.argsort(np.array(width_list)) - - cls_res = [['', 0.0]] * img_num - batch_num = 30 - for beg_img_no in range(0, img_num, batch_num): - end_img_no = min(img_num, beg_img_no + batch_num) - norm_img_batch = [] - max_wh_ratio = 0 - for ino in range(beg_img_no, end_img_no): - h, w = img_list[indices[ino]].shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - for ino in range(beg_img_no, end_img_no): - norm_img = self.resize_norm_img_cls(img_list[indices[ino]]) - norm_img = norm_img[np.newaxis, :] - norm_img_batch.append(norm_img) - norm_img_batch = np.concatenate(norm_img_batch) - norm_img_batch = norm_img_batch.copy() - - self.cls_input_tensor.copy_from_cpu(norm_img_batch) - self.cls_predictor.zero_copy_run() - - prob_out = self.cls_output_tensors[0].copy_to_cpu() - label_out = self.cls_output_tensors[1].copy_to_cpu() - if len(label_out.shape) != 1: - prob_out, label_out = label_out, prob_out - label_list = ['0', '180'] - for rno in range(len(label_out)): - label_idx = label_out[rno] - score = prob_out[rno][label_idx] - label = label_list[label_idx] - cls_res[indices[beg_img_no + rno]] = [label, score] - if '180' in label and score > angle_classification_thresh: - img_list[indices[beg_img_no + rno]] = cv2.rotate( - img_list[indices[beg_img_no + rno]], 1) - return img_list, cls_res - - def _recognize_text(self, img_list): - img_num = len(img_list) - # Calculate the aspect ratio of all text bars - width_list = [] - for img in img_list: - width_list.append(img.shape[1] / float(img.shape[0])) - # Sorting can speed up the recognition process - indices = np.argsort(np.array(width_list)) - - rec_res = [['', 0.0]] * img_num - batch_num = 30 - for beg_img_no in range(0, img_num, batch_num): - end_img_no = min(img_num, beg_img_no + batch_num) - norm_img_batch = [] - max_wh_ratio = 0 - for ino in range(beg_img_no, end_img_no): - h, w = img_list[indices[ino]].shape[0:2] - wh_ratio = w * 1.0 / h - max_wh_ratio = max(max_wh_ratio, wh_ratio) - for ino in range(beg_img_no, end_img_no): - norm_img = self.resize_norm_img_rec(img_list[indices[ino]], - max_wh_ratio) - norm_img = norm_img[np.newaxis, :] - norm_img_batch.append(norm_img) - - norm_img_batch = np.concatenate(norm_img_batch, axis=0) - norm_img_batch = norm_img_batch.copy() - - self.rec_input_tensor.copy_from_cpu(norm_img_batch) - self.rec_predictor.zero_copy_run() - - rec_idx_batch = self.rec_output_tensors[0].copy_to_cpu() - rec_idx_lod = self.rec_output_tensors[0].lod()[0] - predict_batch = self.rec_output_tensors[1].copy_to_cpu() - predict_lod = self.rec_output_tensors[1].lod()[0] - for rno in range(len(rec_idx_lod) - 1): - beg = rec_idx_lod[rno] - end = rec_idx_lod[rno + 1] - rec_idx_tmp = rec_idx_batch[beg:end, 0] - preds_text = self.char_ops.decode(rec_idx_tmp) - beg = predict_lod[rno] - end = predict_lod[rno + 1] - probs = predict_batch[beg:end, :] - ind = np.argmax(probs, axis=1) - blank = probs.shape[1] - valid_ind = np.where(ind != (blank - 1))[0] - if len(valid_ind) == 0: - continue - score = np.mean(probs[valid_ind, ind[valid_ind]]) - # rec_res.append([preds_text, score]) - rec_res[indices[beg_img_no + rno]] = [preds_text, score] - - return rec_res - - def save_inference_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - detector_dir = os.path.join(dirname, 'text_detector') - classifier_dir = os.path.join(dirname, 'angle_classifier') - recognizer_dir = os.path.join(dirname, 'text_recognizer') - self._save_detector_model(detector_dir, model_filename, params_filename, - combined) - if self.use_angle_classification: - self._save_classifier_model(classifier_dir, model_filename, - params_filename, combined) - - self._save_recognizer_model(recognizer_dir, model_filename, - params_filename, combined) - logger.info("The inference model has been saved in the path {}".format( - os.path.realpath(dirname))) - - def _save_detector_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - self.text_detector_module.save_inference_model( - dirname, model_filename, params_filename, combined) - - def _save_recognizer_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - if combined: - model_filename = "__model__" if not model_filename else model_filename - params_filename = "__params__" if not params_filename else params_filename - place = fluid.CPUPlace() - exe = fluid.Executor(place) - - model_file_path = os.path.join(self.rec_pretrained_model_path, 'model') - params_file_path = os.path.join(self.rec_pretrained_model_path, - 'params') - program, feeded_var_names, target_vars = fluid.io.load_inference_model( - dirname=self.rec_pretrained_model_path, - model_filename=model_file_path, - params_filename=params_file_path, - executor=exe) - - fluid.io.save_inference_model( - dirname=dirname, - main_program=program, - executor=exe, - feeded_var_names=feeded_var_names, - target_vars=target_vars, - model_filename=model_filename, - params_filename=params_filename) - - def _save_classifier_model(self, - dirname, - model_filename=None, - params_filename=None, - combined=True): - if combined: - model_filename = "__model__" if not model_filename else model_filename - params_filename = "__params__" if not params_filename else params_filename - place = fluid.CPUPlace() - exe = fluid.Executor(place) - - model_file_path = os.path.join(self.cls_pretrained_model_path, 'model') - params_file_path = os.path.join(self.cls_pretrained_model_path, - 'params') - program, feeded_var_names, target_vars = fluid.io.load_inference_model( - dirname=self.cls_pretrained_model_path, - model_filename=model_file_path, - params_filename=params_file_path, - executor=exe) - - fluid.io.save_inference_model( - dirname=dirname, - main_program=program, - executor=exe, - feeded_var_names=feeded_var_names, - target_vars=target_vars, - model_filename=model_filename, - params_filename=params_filename) - @runnable def run_cmd(self, argvs): """ Run as a command """ - self.parser = argparse.ArgumentParser( - description="Run the %s module." % self.name, - prog='hub run %s' % self.name, - usage='%(prog)s', - add_help=True) - - self.arg_input_group = self.parser.add_argument_group( - title="Input options", description="Input data. Required") - self.arg_config_group = self.parser.add_argument_group( - title="Config options", - description= - "Run configuration for controlling module behavior, not required.") - - self.add_module_config_arg() - self.add_module_input_arg() - - args = self.parser.parse_args(argvs) - results = self.recognize_text( - paths=[args.input_path], - use_gpu=args.use_gpu, - output_dir=args.output_dir, - visualization=args.visualization) + results = self.model.run_cmd(argvs) return results - def add_module_config_arg(self): - """ - Add the command config options - """ - self.arg_config_group.add_argument( - '--use_gpu', - type=ast.literal_eval, - default=False, - help="whether use GPU or not") - self.arg_config_group.add_argument( - '--output_dir', - type=str, - default='ocr_result', - help="The directory to save output images.") - self.arg_config_group.add_argument( - '--visualization', - type=ast.literal_eval, - default=False, - help="whether to save output as images.") - - def add_module_input_arg(self): - """ - Add the command input options - """ - self.arg_input_group.add_argument( - '--input_path', type=str, default=None, help="diretory to image") - + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. -if __name__ == '__main__': - ocr = JapanOCRDBCRNNMobile(enable_mkldnn=False, use_angle_classification=True) - image_path = [ - '/mnt/zhangxuefei/PaddleOCR/doc/imgs/ger_1.jpg', - '/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg', - '/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg' - ] - res = ocr.recognize_text(paths=image_path, visualization=True) - ocr.save_inference_model('save') - print(res) + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py b/modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py deleted file mode 100644 index 8c41af300cc91de369a473cb7327b794b6cf5715..0000000000000000000000000000000000000000 --- a/modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py +++ /dev/null @@ -1,190 +0,0 @@ -# -*- coding:utf-8 -*- -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math - -from PIL import Image, ImageDraw, ImageFont -import base64 -import cv2 -import numpy as np - - -def draw_ocr(image, - boxes, - txts, - scores, - font_file, - draw_txt=True, - drop_score=0.5): - """ - Visualize the results of OCR detection and recognition - args: - image(Image|array): RGB image - boxes(list): boxes with shape(N, 4, 2) - txts(list): the texts - scores(list): txxs corresponding scores - draw_txt(bool): whether draw text or not - drop_score(float): only scores greater than drop_threshold will be visualized - return(array): - the visualized img - """ - if scores is None: - scores = [1] * len(boxes) - for (box, score) in zip(boxes, scores): - if score < drop_score or math.isnan(score): - continue - box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64) - image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2) - - if draw_txt: - img = np.array(resize_img(image, input_size=600)) - txt_img = text_visual( - txts, - scores, - font_file, - img_h=img.shape[0], - img_w=600, - threshold=drop_score) - img = np.concatenate([np.array(img), np.array(txt_img)], axis=1) - return img - return image - - -def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.): - """ - create new blank img and draw txt on it - args: - texts(list): the text will be draw - scores(list|None): corresponding score of each txt - img_h(int): the height of blank img - img_w(int): the width of blank img - return(array): - """ - if scores is not None: - assert len(texts) == len( - scores), "The number of txts and corresponding scores must match" - - def create_blank_img(): - blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255 - blank_img[:, img_w - 1:] = 0 - blank_img = Image.fromarray(blank_img).convert("RGB") - draw_txt = ImageDraw.Draw(blank_img) - return blank_img, draw_txt - - blank_img, draw_txt = create_blank_img() - - font_size = 20 - txt_color = (0, 0, 0) - font = ImageFont.truetype(font_file, font_size, encoding="utf-8") - - gap = font_size + 5 - txt_img_list = [] - count, index = 1, 0 - for idx, txt in enumerate(texts): - index += 1 - if scores[idx] < threshold or math.isnan(scores[idx]): - index -= 1 - continue - first_line = True - while str_count(txt) >= img_w // font_size - 4: - tmp = txt - txt = tmp[:img_w // font_size - 4] - if first_line: - new_txt = str(index) + ': ' + txt - first_line = False - else: - new_txt = ' ' + txt - draw_txt.text((0, gap * count), new_txt, txt_color, font=font) - txt = tmp[img_w // font_size - 4:] - if count >= img_h // gap - 1: - txt_img_list.append(np.array(blank_img)) - blank_img, draw_txt = create_blank_img() - count = 0 - count += 1 - if first_line: - new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx]) - else: - new_txt = " " + txt + " " + '%.3f' % (scores[idx]) - draw_txt.text((0, gap * count), new_txt, txt_color, font=font) - # whether add new blank img or not - if count >= img_h // gap - 1 and idx + 1 < len(texts): - txt_img_list.append(np.array(blank_img)) - blank_img, draw_txt = create_blank_img() - count = 0 - count += 1 - txt_img_list.append(np.array(blank_img)) - if len(txt_img_list) == 1: - blank_img = np.array(txt_img_list[0]) - else: - blank_img = np.concatenate(txt_img_list, axis=1) - return np.array(blank_img) - - -def str_count(s): - """ - Count the number of Chinese characters, - a single English character and a single number - equal to half the length of Chinese characters. - args: - s(string): the input of string - return(int): - the number of Chinese characters - """ - import string - count_zh = count_pu = 0 - s_len = len(s) - en_dg_count = 0 - for c in s: - if c in string.ascii_letters or c.isdigit() or c.isspace(): - en_dg_count += 1 - elif c.isalpha(): - count_zh += 1 - else: - count_pu += 1 - return s_len - math.ceil(en_dg_count / 2) - - -def resize_img(img, input_size=600): - img = np.array(img) - im_shape = img.shape - im_size_min = np.min(im_shape[0:2]) - im_size_max = np.max(im_shape[0:2]) - im_scale = float(input_size) / float(im_size_max) - im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale) - return im - - -def get_image_ext(image): - if image.shape[2] == 4: - return ".png" - return ".jpg" - - -def sorted_boxes(dt_boxes): - """ - Sort text boxes in order from top to bottom, left to right - args: - dt_boxes(array):detected text boxes with shape [4, 2] - return: - sorted boxes(array) with shape [4, 2] - """ - num_boxes = dt_boxes.shape[0] - sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0])) - _boxes = list(sorted_boxes) - - for i in range(num_boxes - 1): - if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \ - (_boxes[i + 1][0][0] < _boxes[i][0][0]): - tmp = _boxes[i] - _boxes[i] = _boxes[i + 1] - _boxes[i + 1] = tmp - return _boxes - - -def base64_to_cv2(b64str): - data = base64.b64decode(b64str.encode('utf8')) - data = np.fromstring(data, np.uint8) - data = cv2.imdecode(data, cv2.IMREAD_COLOR) - return data diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e6199f5e03f595fe8005e8a274dd0692f5c13ce1 --- /dev/null +++ b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# kannada_ocr_db_crnn_mobile + +|模型名称|kannada_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - kannada_ocr_db_crnn_mobile Module用于识别图片当中的卡纳达文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的卡纳达文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别卡纳达文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install kannada_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="kannada_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造KannadaOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m kannada_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/kannada_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..a3825167a9de0d76eef57769ed8ee4606a8fa08a --- /dev/null +++ b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="kannada_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class KannadaOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="ka", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/kannada_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8839950cd5824ead1e8868f7fed762a846579074 --- /dev/null +++ b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# korean_ocr_db_crnn_mobile + +|模型名称|korean_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - korean_ocr_db_crnn_mobile Module用于识别图片当中的韩文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的韩文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别韩文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install french_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="korean_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造KoreanOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m korean_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/korean_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..63de1d2ba8316457e55ebd996a2c9ad1edefb184 --- /dev/null +++ b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="korean_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class KoreanOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="korean", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/korean_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/korean_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..326f88af0c6d62b0fdab8eb9c932be6d94f17204 --- /dev/null +++ b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/README.md @@ -0,0 +1,174 @@ +# latin_ocr_db_crnn_mobile + + +|模型名称|latin_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - latin_ocr_db_crnn_mobile Module用于识别图片当中的拉丁文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的拉丁文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别拉丁文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install latin_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="latin_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造LatinOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m latin_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/latin_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..40ca5bee4acfd5059cee6c8163e90aee6cbc19ee --- /dev/null +++ b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="latin_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class LatinOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="latin", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/latin_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/latin_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/README.md b/modules/image/text_recognition/multi_languages_ocr_db_crnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..72addf8120499a217fc5e69093698af9d3e958d1 --- /dev/null +++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/README.md @@ -0,0 +1,233 @@ +# multi_languages_ocr_db_crnn + +|模型名称|multi_languages_ocr_db_crnn| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+RCNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-11-24| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 应用效果展示 + - 样例结果示例: +

+
+

+ +- ### 模型介绍 + + - multi_languages_ocr_db_crnn Module用于识别图片当中的文字。其基于PaddleOCR模块,检测得到文本框,识别文本框中的文字,再对检测文本框进行角度分类。最终检测算法采用DB(Differentiable Binarization),而识别文字算法则采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。 + 该Module不仅提供了通用场景下的中英文模型,也提供了[80个语言](#语种缩写)的小语种模型。 + + +

+
+

+ + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install multi_languages_ocr_db_crnn + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE" + $ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE" --lang "ch" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="multi_languages_ocr_db_crnn", lang='en', enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + - multi_languages_ocr_db_crnn目前支持80个语种,可以通过修改lang参数进行切换,对于英文模型,指定lang=en,具体支持的[语种](#语种缩写)可查看表格。 + +- ### 3、API + + - ```python + def __init__(self, + lang="ch", + det=True, rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造MultiLangOCR对象 + + - **参数** + - lang(str): 多语言模型选择。默认为中文模型,即lang="ch"。 + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m multi_languages_ocr_db_crnn + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/multi_languages_ocr_db_crnn" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` + + + + +## 五、支持语种及缩写 + +| 语种 | 描述 | 缩写 | | 语种 | 描述 | 缩写 | +| --- | --- | --- | ---|--- | --- | --- | +|中文|chinese and english|ch| |保加利亚文|Bulgarian |bg| +|英文|english|en| |乌克兰文|Ukranian|uk| +|法文|french|fr| |白俄罗斯文|Belarusian|be| +|德文|german|german| |泰卢固文|Telugu |te| +|日文|japan|japan| | 阿巴扎文 | Abaza | abq | +|韩文|korean|korean| |泰米尔文|Tamil |ta| +|中文繁体|chinese traditional |chinese_cht| |南非荷兰文 |Afrikaans |af| +|意大利文| Italian |it| |阿塞拜疆文 |Azerbaijani |az| +|西班牙文|Spanish |es| |波斯尼亚文|Bosnian|bs| +|葡萄牙文| Portuguese|pt| |捷克文|Czech|cs| +|俄罗斯文|Russia|ru| |威尔士文 |Welsh |cy| +|阿拉伯文|Arabic|ar| |丹麦文 |Danish|da| +|印地文|Hindi|hi| |爱沙尼亚文 |Estonian |et| +|维吾尔|Uyghur|ug| |爱尔兰文 |Irish |ga| +|波斯文|Persian|fa| |克罗地亚文|Croatian |hr| +|乌尔都文|Urdu|ur| |匈牙利文|Hungarian |hu| +|塞尔维亚文(latin)| Serbian(latin) |rs_latin| |印尼文|Indonesian|id| +|欧西坦文|Occitan |oc| |冰岛文 |Icelandic|is| +|马拉地文|Marathi|mr| |库尔德文 |Kurdish|ku| +|尼泊尔文|Nepali|ne| |立陶宛文|Lithuanian |lt| +|塞尔维亚文(cyrillic)|Serbian(cyrillic)|rs_cyrillic| |拉脱维亚文 |Latvian |lv| +|毛利文|Maori|mi| | 达尔瓦文|Dargwa |dar| +|马来文 |Malay|ms| | 因古什文|Ingush |inh| +|马耳他文 |Maltese |mt| | 拉克文|Lak |lbe| +|荷兰文 |Dutch |nl| | 莱兹甘文|Lezghian |lez| +|挪威文 |Norwegian |no| |塔巴萨兰文 |Tabassaran |tab| +|波兰文|Polish |pl| | 比尔哈文|Bihari |bh| +| 罗马尼亚文|Romanian |ro| | 迈蒂利文|Maithili |mai| +| 斯洛伐克文|Slovak |sk| | 昂加文|Angika |ang| +| 斯洛文尼亚文|Slovenian |sl| | 孟加拉文|Bhojpuri |bho| +| 阿尔巴尼亚文|Albanian |sq| | 摩揭陀文 |Magahi |mah| +| 瑞典文|Swedish |sv| | 那格浦尔文|Nagpur |sck| +| 西瓦希里文|Swahili |sw| | 尼瓦尔文|Newari |new| +| 塔加洛文|Tagalog |tl| | 保加利亚文 |Goan Konkani|gom| +| 土耳其文|Turkish |tr| | 沙特阿拉伯文|Saudi Arabia|sa| +| 乌兹别克文|Uzbek |uz| | 阿瓦尔文|Avar |ava| +| 越南文|Vietnamese |vi| | 阿瓦尔文|Avar |ava| +| 蒙古文|Mongolian |mn| | 阿迪赫文|Adyghe |ady| diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/__init__.py b/modules/image/text_recognition/multi_languages_ocr_db_crnn/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/arabic.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/arabic.ttf new file mode 100644 index 0000000000000000000000000000000000000000..064b6041ee32814d852e084f639dae75d044d357 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/arabic.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/cyrillic.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/cyrillic.ttf new file mode 100644 index 0000000000000000000000000000000000000000..be4bf6605808d15ab25c9cbbe1fda2a1d190ac8b Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/cyrillic.ttf differ diff --git a/modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/french.ttf similarity index 100% rename from modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german.ttf rename to modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/french.ttf diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/german.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/german.ttf new file mode 100644 index 0000000000000000000000000000000000000000..ab68fb197d4479b3b6dec6e85bd5cbaf433a87c5 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/german.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/hindi.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/hindi.ttf new file mode 100644 index 0000000000000000000000000000000000000000..8b0c36f5868b935464f30883094b9556c3e41009 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/hindi.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/kannada.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/kannada.ttf new file mode 100644 index 0000000000000000000000000000000000000000..43b60d423ad5ea5f5528c9c9e5d6f013f87fa1d7 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/kannada.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/korean.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/korean.ttf new file mode 100644 index 0000000000000000000000000000000000000000..e638ce37f67ff1cd9babf73387786eaeb5c52968 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/korean.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/latin.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/latin.ttf new file mode 100644 index 0000000000000000000000000000000000000000..e392413ac2f82905b3c07073669c3e2058d20235 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/latin.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/marathi.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/marathi.ttf new file mode 100644 index 0000000000000000000000000000000000000000..a796d3edc6a4cc140a9360d0fc502a9d99352db0 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/marathi.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/nepali.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/nepali.ttf new file mode 100644 index 0000000000000000000000000000000000000000..8b0c36f5868b935464f30883094b9556c3e41009 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/nepali.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/persian.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/persian.ttf new file mode 100644 index 0000000000000000000000000000000000000000..bdb1c8d7402148127b7633c6b4cd1586e23745ab Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/persian.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/simfang.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/simfang.ttf new file mode 100644 index 0000000000000000000000000000000000000000..2b59eae4195d1cdbea375503c0cc34d5631cb0f9 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/simfang.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/spanish.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/spanish.ttf new file mode 100644 index 0000000000000000000000000000000000000000..532353d2778cd2bb37a5baf06f5daeea32729168 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/spanish.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/tamil.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/tamil.ttf new file mode 100644 index 0000000000000000000000000000000000000000..2e9998e8d8218f1e868f06ba0db3e13b4620eed1 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/tamil.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/telugu.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/telugu.ttf new file mode 100644 index 0000000000000000000000000000000000000000..12c91e41973a4704f52984e2089fdb2eaf1ed4a5 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/telugu.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/urdu.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/urdu.ttf new file mode 100644 index 0000000000000000000000000000000000000000..625feee2e9616809c13e17eeb7da1aec58988b65 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/urdu.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/uyghur.ttf b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/uyghur.ttf new file mode 100644 index 0000000000000000000000000000000000000000..625feee2e9616809c13e17eeb7da1aec58988b65 Binary files /dev/null and b/modules/image/text_recognition/multi_languages_ocr_db_crnn/assets/fonts/uyghur.ttf differ diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/module.py b/modules/image/text_recognition/multi_languages_ocr_db_crnn/module.py new file mode 100644 index 0000000000000000000000000000000000000000..0d956627d1bf90e46e6d86368c6bfb7886cddb2a --- /dev/null +++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/module.py @@ -0,0 +1,221 @@ +import argparse +import sys +import os +import ast + +import paddle +import paddle2onnx +import paddle2onnx as p2o +import paddle.fluid as fluid +from paddleocr import PaddleOCR +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + +from .utils import read_images, save_result_image, mkdir + + +@moduleinfo( + name="multi_languages_ocr_db_crnn", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class MultiLangOCR: + def __init__(self, + lang="ch", + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + lang(str): the selection of languages + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.lang = lang + self.logger = get_logger() + argc = len(sys.argv) + if argc == 1 or argc > 1 and sys.argv[1] == 'serving': + self.det = det + self.rec = rec + self.use_angle_cls = use_angle_cls + self.engine = PaddleOCR( + lang=lang, + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + det_db_box_thresh=box_thresh, + cls_thresh=angle_classification_thresh) + self.det_model_dir = self.engine.text_detector.args.det_model_dir + self.rec_model_dir = self.engine.text_detector.args.rec_model_dir + self.cls_model_dir = self.engine.text_detector.args.cls_model_dir + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + + if images != [] and isinstance(images, list) and paths == []: + predicted_data = images + elif images == [] and isinstance(paths, list) and paths != []: + predicted_data = read_images(paths) + else: + raise TypeError("The input data is inconsistent with expectations.") + + assert predicted_data != [], "There is not any image to be predicted. Please check the input data." + all_results = [] + for img in predicted_data: + result = {'save_path': ''} + if img is None: + result['data'] = [] + all_results.append(result) + continue + original_image = img.copy() + rec_results = self.engine.ocr(img, det=self.det, rec=self.rec, cls=self.use_angle_cls) + rec_res_final = [] + for line in rec_results: + if self.det and self.rec: + boxes = line[0] + text, score = line[1] + rec_res_final.append({'text': text, 'confidence': float(score), 'text_box_position': boxes}) + elif self.det and not self.rec: + boxes = line + rec_res_final.append({'text_box_position': boxes}) + else: + if self.use_angle_cls and not self.rec: + orientation, score = line + rec_res_final.append({'orientation': orientation, 'score': float(score)}) + else: + text, score = line + rec_res_final.append({'text': text, 'confidence': float(score)}) + + result['data'] = rec_res_final + if visualization and result['data']: + result['save_path'] = save_result_image(original_image, rec_results, output_dir, self.directory, + self.lang, self.det, self.rec, self.logger) + + all_results.append(result) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + parser = self.arg_parser() + args = parser.parse_args(argvs) + if args.lang is not None: + self.lang = args.lang + self.det = args.det + self.rec = args.rec + self.use_angle_cls = args.use_angle_cls + self.engine = PaddleOCR( + lang=self.lang, + det=args.det, + rec=args.rec, + use_angle_cls=args.use_angle_cls, + enable_mkldnn=args.enable_mkldnn, + use_gpu=args.use_gpu, + det_db_box_thresh=args.box_thresh, + cls_thresh=args.angle_classification_thresh) + results = self.recognize_text( + paths=[args.input_path], output_dir=args.output_dir, visualization=args.visualization) + return results + + def arg_parser(self): + parser = argparse.ArgumentParser( + description="Run the %s module." % self.name, + prog='hub run %s' % self.name, + usage='%(prog)s', + add_help=True) + + parser.add_argument('--input_path', type=str, default=None, help="diretory to image. Required.", required=True) + parser.add_argument('--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not") + parser.add_argument('--output_dir', type=str, default='ocr_result', help="The directory to save output images.") + parser.add_argument( + '--visualization', type=ast.literal_eval, default=False, help="whether to save output as images.") + parser.add_argument('--lang', type=str, default=None, help="the selection of languages") + parser.add_argument('--det', type=ast.literal_eval, default=True, help="whether use text detector or not") + parser.add_argument('--rec', type=ast.literal_eval, default=True, help="whether use text recognizer or not") + parser.add_argument( + '--use_angle_cls', type=ast.literal_eval, default=False, help="whether text orientation classifier or not") + parser.add_argument('--enable_mkldnn', type=ast.literal_eval, default=False, help="whether use mkldnn or not") + parser.add_argument( + "--box_thresh", type=float, default=0.6, help="set the threshold of the detected text box's confidence") + parser.add_argument( + "--angle_classification_thresh", + type=float, + default=0.9, + help="set the threshold of the angle classification confidence") + + return parser + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + v0, v1, v2 = paddle2onnx.__version__.split('.') + if int(v1) < 9: + raise ImportError("paddle2onnx>=0.9.0 is required") + if input_shape_dict is None: + input_shape_dict = {'x': [-1, 3, -1, -1]} + if input_shape_dict is not None and not isinstance(input_shape_dict, dict): + raise Exception("input_shape_dict should be dict, eg. {'x': [-1, 3, -1, -1]}.") + + if opset_version <= 9: + raise Exception("opset_version <= 9 is not surpported, please try with higher opset_version >=10.") + + path_dict = {"det": self.det_model_dir, "rec": self.rec_model_dir, "cls": self.cls_model_dir} + for (key, path) in path_dict.items(): + model_filename = 'inference.pdmodel' + params_filename = 'inference.pdiparams' + save_file = os.path.join(dirname, '{}_{}.onnx'.format(self.name, key)) + + # convert model save with 'paddle.fluid.io.save_inference_model' + if hasattr(paddle, 'enable_static'): + paddle.enable_static() + exe = fluid.Executor(fluid.CPUPlace()) + if model_filename is None and params_filename is None: + [program, feed_var_names, fetch_vars] = fluid.io.load_inference_model(path, exe) + else: + [program, feed_var_names, fetch_vars] = fluid.io.load_inference_model( + path, exe, model_filename=model_filename, params_filename=params_filename) + + onnx_proto = p2o.run_convert(program, input_shape_dict=input_shape_dict, opset_version=opset_version) + mkdir(save_file) + with open(save_file, "wb") as f: + f.write(onnx_proto.SerializeToString()) diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/requirements.txt b/modules/image/text_recognition/multi_languages_ocr_db_crnn/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/multi_languages_ocr_db_crnn/utils.py b/modules/image/text_recognition/multi_languages_ocr_db_crnn/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..e64e791e5e4e62bc90f73ad0698403028bd9bf9b --- /dev/null +++ b/modules/image/text_recognition/multi_languages_ocr_db_crnn/utils.py @@ -0,0 +1,100 @@ +import os +import time + +import cv2 +import numpy as np +from PIL import Image, ImageDraw + +from paddleocr import draw_ocr + + +def save_result_image(original_image, + rec_results, + output_dir='ocr_result', + directory=None, + lang='ch', + det=True, + rec=True, + logger=None): + image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)) + if det and rec: + boxes = [line[0] for line in rec_results] + txts = [line[1][0] for line in rec_results] + scores = [line[1][1] for line in rec_results] + fonts_lang = 'fonts/simfang.ttf' + lang_fonts = { + 'korean': 'korean', + 'fr': 'french', + 'german': 'german', + 'hi': 'hindi', + 'ne': 'nepali', + 'fa': 'persian', + 'es': 'spanish', + 'ta': 'tamil', + 'te': 'telugu', + 'ur': 'urdu', + 'ug': 'uyghur', + } + if lang in lang_fonts.keys(): + fonts_lang = 'fonts/' + lang_fonts[lang] + '.ttf' + font_file = os.path.join(directory, 'assets', fonts_lang) + im_show = draw_ocr(image, boxes, txts, scores, font_path=font_file) + elif det and not rec: + boxes = rec_results + im_show = draw_boxes(image, boxes) + im_show = np.array(im_show) + else: + logger.warning("only cls or rec not supported visualization.") + return "" + + if not os.path.exists(output_dir): + os.makedirs(output_dir) + + ext = get_image_ext(original_image) + saved_name = 'ndarray_{}{}'.format(time.time(), ext) + save_file_path = os.path.join(output_dir, saved_name) + im_show = Image.fromarray(im_show) + im_show.save(save_file_path) + return save_file_path + + +def read_images(paths=[]): + images = [] + for img_path in paths: + assert os.path.isfile(img_path), "The {} isn't a valid file.".format(img_path) + img = cv2.imread(img_path) + if img is None: + continue + images.append(img) + return images + + +def draw_boxes(image, boxes, scores=None, drop_score=0.5): + img = image.copy() + draw = ImageDraw.Draw(img) + if scores is None: + scores = [1] * len(boxes) + for (box, score) in zip(boxes, scores): + if score < drop_score: + continue + draw.line([(box[0][0], box[0][1]), (box[1][0], box[1][1])], fill='red') + draw.line([(box[1][0], box[1][1]), (box[2][0], box[2][1])], fill='red') + draw.line([(box[2][0], box[2][1]), (box[3][0], box[3][1])], fill='red') + draw.line([(box[3][0], box[3][1]), (box[0][0], box[0][1])], fill='red') + draw.line([(box[0][0] - 1, box[0][1] + 1), (box[1][0] - 1, box[1][1] + 1)], fill='red') + draw.line([(box[1][0] - 1, box[1][1] + 1), (box[2][0] - 1, box[2][1] + 1)], fill='red') + draw.line([(box[2][0] - 1, box[2][1] + 1), (box[3][0] - 1, box[3][1] + 1)], fill='red') + draw.line([(box[3][0] - 1, box[3][1] + 1), (box[0][0] - 1, box[0][1] + 1)], fill='red') + return img + + +def get_image_ext(image): + if image.shape[2] == 4: + return ".png" + return ".jpg" + + +def mkdir(path): + sub_dir = os.path.dirname(path) + if not os.path.exists(sub_dir): + os.makedirs(sub_dir) diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8f1adbde4e78484c5f738ac74458e8fc239308cf --- /dev/null +++ b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# tamil_ocr_db_crnn_mobile + +|模型名称|tamil_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - tamil_ocr_db_crnn_mobile Module用于识别图片当中的泰米尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的泰米尔文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别泰米尔文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install tamil_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run tamil_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run tamil_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="tamil_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造TamilOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m tamil_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/tamil_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..22321babd3812e3f39f9670b6aa6ce2a180a5a3f --- /dev/null +++ b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="tamil_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class TamilOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="ta", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/tamil_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/README.md b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b35dd4159b19a31546bf11bd01356e96c9101838 --- /dev/null +++ b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/README.md @@ -0,0 +1,173 @@ +# telugu_ocr_db_crnn_mobile + +|模型名称|telugu_ocr_db_crnn_mobile| +| :--- | :---: | +|类别|图像-文字识别| +|网络|Differentiable Binarization+CRNN| +|数据集|icdar2015数据集| +|是否支持Fine-tuning|否| +|最新更新日期|2021-12-2| +|数据指标|-| + + +## 一、模型基本信息 + +- ### 模型介绍 + + - telugu_ocr_db_crnn_mobile Module用于识别图片当中的泰卢固文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的泰卢固文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别泰卢固文的轻量级OCR模型,支持直接预测。 + + - 更多详情参考: + - [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf) + - [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf) + + + +## 二、安装 + +- ### 1、环境依赖 + + - PaddlePaddle >= 2.0.2 + + - Python >= 3.6 + + - PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1) + + - PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) + + - Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md) + + - shapely + + - pyclipper + + - ```shell + $ pip3.6 install "paddleocr==2.3.0.2" + $ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple + $ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple + ``` + - **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。** + +- ### 2、安装 + + - ```shell + $ hub install telugu_ocr_db_crnn_mobile + ``` + - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) + | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) + + + +## 三、模型API预测 + +- ### 1、命令行预测 + + - ```shell + $ hub run telugu_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" + $ hub run telugu_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True + ``` + - 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) + +- ### 2、代码示例 + + - ```python + import paddlehub as hub + import cv2 + + ocr = hub.Module(name="telugu_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 + result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) + + # or + # result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) + ``` + +- ### 3、API + + - ```python + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9) + ``` + + - 构造TeluguOCRDBCRNNMobile对象 + + - **参数** + - det(bool): 是否开启文字检测。默认为True。 + - rec(bool): 是否开启文字识别。默认为True。 + - use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。 + - enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 + - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** + - box\_thresh (float): 检测文本框置信度的阈值; + - angle_classification_thresh(float): 文本方向分类置信度的阈值 + + + - ```python + def recognize_text(images=[], + paths=[], + output_dir='ocr_result', + visualization=False) + ``` + + - 预测API,检测输入图片中的所有文本的位置和识别文本结果。 + + - **参数** + + - paths (list\[str\]): 图片的路径; + - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; + - output\_dir (str): 图片的保存路径,默认设为 ocr\_result; + - visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False; + + - **返回** + + - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: + - data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: + - text(str): 识别得到的文本 + - confidence(float): 识别文本结果置信度 + - text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\] + - orientation(str): 分类的方向,仅在只有方向分类开启时输出 + - score(float): 分类的得分,仅在只有方向分类开启时输出 + - save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' + + +## 四、服务部署 + +- PaddleHub Serving 可以部署一个目标检测的在线服务。 + +- ### 第一步:启动PaddleHub Serving + + - 运行启动命令: + - ```shell + $ hub serving start -m telugu_ocr_db_crnn_mobile + ``` + + - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 + + - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 + +- ### 第二步:发送预测请求 + + - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 + + - ```python + import requests + import json + import cv2 + import base64 + + def cv2_to_base64(image): + data = cv2.imencode('.jpg', image)[1] + return base64.b64encode(data.tostring()).decode('utf8') + + # 发送HTTP请求 + data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} + headers = {"Content-type": "application/json"} + url = "http://127.0.0.1:8866/predict/telugu_ocr_db_crnn_mobile" + r = requests.post(url=url, headers=headers, data=json.dumps(data)) + + # 打印预测结果 + print(r.json()["results"]) + ``` diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/__init__.py b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/module.py b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/module.py new file mode 100644 index 0000000000000000000000000000000000000000..7cfd283a93c300daa080077cb8369323364ee20a --- /dev/null +++ b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/module.py @@ -0,0 +1,87 @@ +import paddlehub as hub +from paddleocr.ppocr.utils.logging import get_logger +from paddleocr.tools.infer.utility import base64_to_cv2 +from paddlehub.module.module import moduleinfo, runnable, serving + + +@moduleinfo( + name="telugu_ocr_db_crnn_mobile", + version="1.0.0", + summary="ocr service", + author="PaddlePaddle", + type="cv/text_recognition") +class TeluguOCRDBCRNNMobile: + def __init__(self, + det=True, + rec=True, + use_angle_cls=False, + enable_mkldnn=False, + use_gpu=False, + box_thresh=0.6, + angle_classification_thresh=0.9): + """ + initialize with the necessary elements + Args: + det(bool): Whether to use text detector. + rec(bool): Whether to use text recognizer. + use_angle_cls(bool): Whether to use text orientation classifier. + enable_mkldnn(bool): Whether to enable mkldnn. + use_gpu (bool): Whether to use gpu. + box_thresh(float): the threshold of the detected text box's confidence + angle_classification_thresh(float): the threshold of the angle classification confidence + """ + self.logger = get_logger() + self.model = hub.Module( + name="multi_languages_ocr_db_crnn", + lang="te", + det=det, + rec=rec, + use_angle_cls=use_angle_cls, + enable_mkldnn=enable_mkldnn, + use_gpu=use_gpu, + box_thresh=box_thresh, + angle_classification_thresh=angle_classification_thresh) + self.model.name = self.name + + def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False): + """ + Get the text in the predicted images. + Args: + images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths + paths (list[str]): The paths of images. If paths not images + output_dir (str): The directory to store output images. + visualization (bool): Whether to save image or not. + Returns: + res (list): The result of text detection box and save path of images. + """ + all_results = self.model.recognize_text( + images=images, paths=paths, output_dir=output_dir, visualization=visualization) + return all_results + + @serving + def serving_method(self, images, **kwargs): + """ + Run as a service. + """ + images_decode = [base64_to_cv2(image) for image in images] + results = self.recognize_text(images_decode, **kwargs) + return results + + @runnable + def run_cmd(self, argvs): + """ + Run as a command + """ + results = self.model.run_cmd(argvs) + return results + + def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10): + ''' + Export the model to ONNX format. + + Args: + dirname(str): The directory to save the onnx model. + input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}`` + opset_version(int): operator set + ''' + self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version) diff --git a/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/requirements.txt b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..527c6de7f643cb427013aaff2409365538fed2d3 --- /dev/null +++ b/modules/image/text_recognition/telugu_ocr_db_crnn_mobile/requirements.txt @@ -0,0 +1,4 @@ +paddleocr>=2.3.0.2 +paddle2onnx>=0.9.0 +shapely +pyclipper