Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleHub
提交
6ca472e5
P
PaddleHub
项目概览
PaddlePaddle
/
PaddleHub
1 年多 前同步成功
通知
283
Star
12117
Fork
2091
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
200
列表
看板
标记
里程碑
合并请求
4
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
200
Issue
200
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
6ca472e5
编写于
9月 17, 2021
作者:
C
chenjian
提交者:
GitHub
9月 17, 2021
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Add ocr modules and update the docs according to template (#1603)
上级
d32cc733
变更
15
展开全部
隐藏空白更改
内联
并排
Showing
15 changed file
with
6957 addition
and
57 deletion
+6957
-57
modules/image/text_recognition/german_ocr_db_crnn_mobile/README.md
...mage/text_recognition/german_ocr_db_crnn_mobile/README.md
+171
-0
modules/image/text_recognition/german_ocr_db_crnn_mobile/__init__.py
...ge/text_recognition/german_ocr_db_crnn_mobile/__init__.py
+0
-0
modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german.ttf
...t_recognition/german_ocr_db_crnn_mobile/assets/german.ttf
+0
-0
modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt
...ognition/german_ocr_db_crnn_mobile/assets/german_dict.txt
+131
-0
modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py
...e/text_recognition/german_ocr_db_crnn_mobile/character.py
+213
-0
modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py
...mage/text_recognition/german_ocr_db_crnn_mobile/module.py
+591
-0
modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py
...image/text_recognition/german_ocr_db_crnn_mobile/utils.py
+190
-0
modules/image/text_recognition/japan_ocr_db_crnn_mobile/README.md
...image/text_recognition/japan_ocr_db_crnn_mobile/README.md
+172
-0
modules/image/text_recognition/japan_ocr_db_crnn_mobile/__init__.py
...age/text_recognition/japan_ocr_db_crnn_mobile/__init__.py
+0
-0
modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc
...ext_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc
+0
-0
modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt
...ecognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt
+4399
-0
modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py
...ge/text_recognition/japan_ocr_db_crnn_mobile/character.py
+213
-0
modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py
...image/text_recognition/japan_ocr_db_crnn_mobile/module.py
+591
-0
modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py
.../image/text_recognition/japan_ocr_db_crnn_mobile/utils.py
+190
-0
modules/thirdparty/image/text_recognition/Vehicle_License_Plate_Recognition/README.md
...t_recognition/Vehicle_License_Plate_Recognition/README.md
+96
-57
未找到文件。
modules/image/text_recognition/german_ocr_db_crnn_mobile/README.md
0 → 100644
浏览文件 @
6ca472e5
# german_ocr_db_crnn_mobile
|模型名称|german_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|模型大小|3.8MB|
|最新更新日期|2021-02-26|
|数据指标|-|
## 一、模型基本信息
-
### 应用效果展示
-
样例结果示例:
<p
align=
"center"
>
<img
src=
"https://user-images.githubusercontent.com/22424850/133761772-8c47f25f-0d95-45b4-8075-867dbbd14c86.jpg"
width=
"80%"
hspace=
'10'
/>
<br
/>
</p>
-
### 模型介绍
-
german_ocr_db_crnn_mobile Module用于识别图片当中的德文。其基于chinese_text_detection_db_mobile检测得到的文本框,继续识别文本框中的德文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别德文的轻量级OCR模型,支持直接预测。
## 二、安装
-
### 1、环境依赖
-
paddlepaddle >= 1.8.0
-
paddlehub >= 1.8.0 |
[
如何安装paddlehub
](
../../../../docs/docs_ch/get_start/installation.rst
)
-
shapely
-
pyclipper
-
```shell
$ pip install shapely pyclipper
```
-
**该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
-
### 2、安装
-
```shell
$ hub install german_ocr_db_crnn_mobile
```
-
如您安装时遇到问题,可参考:
[
零基础windows安装
](
../../../../docs/docs_ch/get_start/windows_quickstart.md
)
|
[
零基础Linux安装
](
../../../../docs/docs_ch/get_start/linux_quickstart.md
)
|
[
零基础MacOS安装
](
../../../../docs/docs_ch/get_start/mac_quickstart.md
)
## 三、模型API预测
-
### 1、命令行预测
-
```shell
$ hub run german_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
```
-
通过命令行方式实现文字识别模型的调用,更多请见
[
PaddleHub命令行指令
](
../../../../docs/docs_ch/tutorial/cmd_usage.rst
)
-
### 2、代码示例
-
```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="german_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
-
### 3、API
-
```python
def __init__(text_detector_module=None, enable_mkldnn=False)
```
- 构造GenmanOCRDBCRNNMobile对象
- **参数**
- text_detector_module(str): 文字检测PaddleHub Module名字,如设置为None,则默认使用[chinese_text_detection_db_mobile Module](../chinese_text_detection_db_mobile/)。其作用为检测图片当中的文本。<br/>
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
-
```python
def recognize_text(images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5,
angle_classification_thresh=0.9)
```
- 预测API,检测输入图片中的所有德文文本的位置。
- **参数**
- paths (list\[str\]): 图片的路径; <br/>
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; <br/>
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** <br/>
- box\_thresh (float): 检测文本框置信度的阈值; <br/>
- text\_thresh (float): 识别德文文本置信度的阈值; <br/>
- angle_classification_thresh(float): 文本角度分类置信度的阈值 <br/>
- visualization (bool): 是否将识别结果保存为图片文件; <br/>
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标
如果无识别结果则data为\[\]
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
-
PaddleHub Serving 可以部署一个目标检测的在线服务。
-
### 第一步:启动PaddleHub Serving
-
运行启动命令:
-
```shell
$ hub serving start -m german_ocr_db_crnn_mobile
```
-
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
-
**NOTE:**
如使用GPU预测,则需要在启动服务之前,请设置CUDA
\_
VISIBLE
\_
DEVICES环境变量,否则不用设置。
-
### 第二步:发送预测请求
-
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
-
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/german_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
*
1.0.0
初始发布
-
```shell
$ hub install german_ocr_db_crnn_mobile==1.0.0
```
modules/image/text_recognition/german_ocr_db_crnn_mobile/__init__.py
0 → 100644
浏览文件 @
6ca472e5
modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german.ttf
0 → 100644
浏览文件 @
6ca472e5
文件已添加
modules/image/text_recognition/german_ocr_db_crnn_mobile/assets/german_dict.txt
0 → 100644
浏览文件 @
6ca472e5
!
"
$
%
&
'
(
)
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
>
?
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
]
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
£
§
²
´
µ
·
º
¼
½
¿
À
Á
Ä
Å
Ç
É
Í
Ï
Ô
Ö
Ø
Ù
Ü
ß
à
á
â
ã
ä
å
æ
ç
è
é
ê
ë
í
ï
ñ
ò
ó
ô
ö
ø
ù
ú
û
ü
modules/image/text_recognition/german_ocr_db_crnn_mobile/character.py
0 → 100644
浏览文件 @
6ca472e5
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
numpy
as
np
import
string
class
CharacterOps
(
object
):
""" Convert between text-label and text-index """
def
__init__
(
self
,
config
):
self
.
character_type
=
config
[
'character_type'
]
self
.
loss_type
=
config
[
'loss_type'
]
self
.
max_text_len
=
config
[
'max_text_length'
]
if
self
.
character_type
==
"en"
:
self
.
character_str
=
"0123456789abcdefghijklmnopqrstuvwxyz"
dict_character
=
list
(
self
.
character_str
)
elif
self
.
character_type
in
[
"ch"
,
'japan'
,
'korean'
,
'french'
,
'german'
]:
character_dict_path
=
config
[
'character_dict_path'
]
add_space
=
False
if
'use_space_char'
in
config
:
add_space
=
config
[
'use_space_char'
]
self
.
character_str
=
""
with
open
(
character_dict_path
,
"rb"
)
as
fin
:
lines
=
fin
.
readlines
()
for
line
in
lines
:
line
=
line
.
decode
(
'utf-8'
).
strip
(
"
\n
"
).
strip
(
"
\r\n
"
)
self
.
character_str
+=
line
if
add_space
:
self
.
character_str
+=
" "
dict_character
=
list
(
self
.
character_str
)
elif
self
.
character_type
==
"en_sensitive"
:
# same with ASTER setting (use 94 char).
self
.
character_str
=
string
.
printable
[:
-
6
]
dict_character
=
list
(
self
.
character_str
)
else
:
self
.
character_str
=
None
assert
self
.
character_str
is
not
None
,
\
"Nonsupport type of the character: {}"
.
format
(
self
.
character_str
)
self
.
beg_str
=
"sos"
self
.
end_str
=
"eos"
if
self
.
loss_type
==
"attention"
:
dict_character
=
[
self
.
beg_str
,
self
.
end_str
]
+
dict_character
elif
self
.
loss_type
==
"srn"
:
dict_character
=
dict_character
+
[
self
.
beg_str
,
self
.
end_str
]
self
.
dict
=
{}
for
i
,
char
in
enumerate
(
dict_character
):
self
.
dict
[
char
]
=
i
self
.
character
=
dict_character
def
encode
(
self
,
text
):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if
self
.
character_type
==
"en"
:
text
=
text
.
lower
()
text_list
=
[]
for
char
in
text
:
if
char
not
in
self
.
dict
:
continue
text_list
.
append
(
self
.
dict
[
char
])
text
=
np
.
array
(
text_list
)
return
text
def
decode
(
self
,
text_index
,
is_remove_duplicate
=
False
):
""" convert text-index into text-label. """
char_list
=
[]
char_num
=
self
.
get_char_num
()
if
self
.
loss_type
==
"attention"
:
beg_idx
=
self
.
get_beg_end_flag_idx
(
"beg"
)
end_idx
=
self
.
get_beg_end_flag_idx
(
"end"
)
ignored_tokens
=
[
beg_idx
,
end_idx
]
else
:
ignored_tokens
=
[
char_num
]
for
idx
in
range
(
len
(
text_index
)):
if
text_index
[
idx
]
in
ignored_tokens
:
continue
if
is_remove_duplicate
:
if
idx
>
0
and
text_index
[
idx
-
1
]
==
text_index
[
idx
]:
continue
char_list
.
append
(
self
.
character
[
int
(
text_index
[
idx
])])
text
=
''
.
join
(
char_list
)
return
text
def
get_char_num
(
self
):
return
len
(
self
.
character
)
def
get_beg_end_flag_idx
(
self
,
beg_or_end
):
if
self
.
loss_type
==
"attention"
:
if
beg_or_end
==
"beg"
:
idx
=
np
.
array
(
self
.
dict
[
self
.
beg_str
])
elif
beg_or_end
==
"end"
:
idx
=
np
.
array
(
self
.
dict
[
self
.
end_str
])
else
:
assert
False
,
"Unsupport type %s in get_beg_end_flag_idx"
\
%
beg_or_end
return
idx
else
:
err
=
"error in get_beg_end_flag_idx when using the loss %s"
\
%
(
self
.
loss_type
)
assert
False
,
err
def
cal_predicts_accuracy
(
char_ops
,
preds
,
preds_lod
,
labels
,
labels_lod
,
is_remove_duplicate
=
False
):
acc_num
=
0
img_num
=
0
for
ino
in
range
(
len
(
labels_lod
)
-
1
):
beg_no
=
preds_lod
[
ino
]
end_no
=
preds_lod
[
ino
+
1
]
preds_text
=
preds
[
beg_no
:
end_no
].
reshape
(
-
1
)
preds_text
=
char_ops
.
decode
(
preds_text
,
is_remove_duplicate
)
beg_no
=
labels_lod
[
ino
]
end_no
=
labels_lod
[
ino
+
1
]
labels_text
=
labels
[
beg_no
:
end_no
].
reshape
(
-
1
)
labels_text
=
char_ops
.
decode
(
labels_text
,
is_remove_duplicate
)
img_num
+=
1
if
preds_text
==
labels_text
:
acc_num
+=
1
acc
=
acc_num
*
1.0
/
img_num
return
acc
,
acc_num
,
img_num
def
cal_predicts_accuracy_srn
(
char_ops
,
preds
,
labels
,
max_text_len
,
is_debug
=
False
):
acc_num
=
0
img_num
=
0
char_num
=
char_ops
.
get_char_num
()
total_len
=
preds
.
shape
[
0
]
img_num
=
int
(
total_len
/
max_text_len
)
for
i
in
range
(
img_num
):
cur_label
=
[]
cur_pred
=
[]
for
j
in
range
(
max_text_len
):
if
labels
[
j
+
i
*
max_text_len
]
!=
int
(
char_num
-
1
):
#0
cur_label
.
append
(
labels
[
j
+
i
*
max_text_len
][
0
])
else
:
break
for
j
in
range
(
max_text_len
+
1
):
if
j
<
len
(
cur_label
)
and
preds
[
j
+
i
*
max_text_len
][
0
]
!=
cur_label
[
j
]:
break
elif
j
==
len
(
cur_label
)
and
j
==
max_text_len
:
acc_num
+=
1
break
elif
j
==
len
(
cur_label
)
and
preds
[
j
+
i
*
max_text_len
][
0
]
==
int
(
char_num
-
1
):
acc_num
+=
1
break
acc
=
acc_num
*
1.0
/
img_num
return
acc
,
acc_num
,
img_num
def
convert_rec_attention_infer_res
(
preds
):
img_num
=
preds
.
shape
[
0
]
target_lod
=
[
0
]
convert_ids
=
[]
for
ino
in
range
(
img_num
):
end_pos
=
np
.
where
(
preds
[
ino
,
:]
==
1
)[
0
]
if
len
(
end_pos
)
<=
1
:
text_list
=
preds
[
ino
,
1
:]
else
:
text_list
=
preds
[
ino
,
1
:
end_pos
[
1
]]
target_lod
.
append
(
target_lod
[
ino
]
+
len
(
text_list
))
convert_ids
=
convert_ids
+
list
(
text_list
)
convert_ids
=
np
.
array
(
convert_ids
)
convert_ids
=
convert_ids
.
reshape
((
-
1
,
1
))
return
convert_ids
,
target_lod
def
convert_rec_label_to_lod
(
ori_labels
):
img_num
=
len
(
ori_labels
)
target_lod
=
[
0
]
convert_ids
=
[]
for
ino
in
range
(
img_num
):
target_lod
.
append
(
target_lod
[
ino
]
+
len
(
ori_labels
[
ino
]))
convert_ids
=
convert_ids
+
list
(
ori_labels
[
ino
])
convert_ids
=
np
.
array
(
convert_ids
)
convert_ids
=
convert_ids
.
reshape
((
-
1
,
1
))
return
convert_ids
,
target_lod
modules/image/text_recognition/german_ocr_db_crnn_mobile/module.py
0 → 100644
浏览文件 @
6ca472e5
此差异已折叠。
点击以展开。
modules/image/text_recognition/german_ocr_db_crnn_mobile/utils.py
0 → 100644
浏览文件 @
6ca472e5
# -*- coding:utf-8 -*-
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
math
from
PIL
import
Image
,
ImageDraw
,
ImageFont
import
base64
import
cv2
import
numpy
as
np
def
draw_ocr
(
image
,
boxes
,
txts
,
scores
,
font_file
,
draw_txt
=
True
,
drop_score
=
0.5
):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if
scores
is
None
:
scores
=
[
1
]
*
len
(
boxes
)
for
(
box
,
score
)
in
zip
(
boxes
,
scores
):
if
score
<
drop_score
or
math
.
isnan
(
score
):
continue
box
=
np
.
reshape
(
np
.
array
(
box
),
[
-
1
,
1
,
2
]).
astype
(
np
.
int64
)
image
=
cv2
.
polylines
(
np
.
array
(
image
),
[
box
],
True
,
(
255
,
0
,
0
),
2
)
if
draw_txt
:
img
=
np
.
array
(
resize_img
(
image
,
input_size
=
600
))
txt_img
=
text_visual
(
txts
,
scores
,
font_file
,
img_h
=
img
.
shape
[
0
],
img_w
=
600
,
threshold
=
drop_score
)
img
=
np
.
concatenate
([
np
.
array
(
img
),
np
.
array
(
txt_img
)],
axis
=
1
)
return
img
return
image
def
text_visual
(
texts
,
scores
,
font_file
,
img_h
=
400
,
img_w
=
600
,
threshold
=
0.
):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if
scores
is
not
None
:
assert
len
(
texts
)
==
len
(
scores
),
"The number of txts and corresponding scores must match"
def
create_blank_img
():
blank_img
=
np
.
ones
(
shape
=
[
img_h
,
img_w
],
dtype
=
np
.
int8
)
*
255
blank_img
[:,
img_w
-
1
:]
=
0
blank_img
=
Image
.
fromarray
(
blank_img
).
convert
(
"RGB"
)
draw_txt
=
ImageDraw
.
Draw
(
blank_img
)
return
blank_img
,
draw_txt
blank_img
,
draw_txt
=
create_blank_img
()
font_size
=
20
txt_color
=
(
0
,
0
,
0
)
font
=
ImageFont
.
truetype
(
font_file
,
font_size
,
encoding
=
"utf-8"
)
gap
=
font_size
+
5
txt_img_list
=
[]
count
,
index
=
1
,
0
for
idx
,
txt
in
enumerate
(
texts
):
index
+=
1
if
scores
[
idx
]
<
threshold
or
math
.
isnan
(
scores
[
idx
]):
index
-=
1
continue
first_line
=
True
while
str_count
(
txt
)
>=
img_w
//
font_size
-
4
:
tmp
=
txt
txt
=
tmp
[:
img_w
//
font_size
-
4
]
if
first_line
:
new_txt
=
str
(
index
)
+
': '
+
txt
first_line
=
False
else
:
new_txt
=
' '
+
txt
draw_txt
.
text
((
0
,
gap
*
count
),
new_txt
,
txt_color
,
font
=
font
)
txt
=
tmp
[
img_w
//
font_size
-
4
:]
if
count
>=
img_h
//
gap
-
1
:
txt_img_list
.
append
(
np
.
array
(
blank_img
))
blank_img
,
draw_txt
=
create_blank_img
()
count
=
0
count
+=
1
if
first_line
:
new_txt
=
str
(
index
)
+
': '
+
txt
+
' '
+
'%.3f'
%
(
scores
[
idx
])
else
:
new_txt
=
" "
+
txt
+
" "
+
'%.3f'
%
(
scores
[
idx
])
draw_txt
.
text
((
0
,
gap
*
count
),
new_txt
,
txt_color
,
font
=
font
)
# whether add new blank img or not
if
count
>=
img_h
//
gap
-
1
and
idx
+
1
<
len
(
texts
):
txt_img_list
.
append
(
np
.
array
(
blank_img
))
blank_img
,
draw_txt
=
create_blank_img
()
count
=
0
count
+=
1
txt_img_list
.
append
(
np
.
array
(
blank_img
))
if
len
(
txt_img_list
)
==
1
:
blank_img
=
np
.
array
(
txt_img_list
[
0
])
else
:
blank_img
=
np
.
concatenate
(
txt_img_list
,
axis
=
1
)
return
np
.
array
(
blank_img
)
def
str_count
(
s
):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import
string
count_zh
=
count_pu
=
0
s_len
=
len
(
s
)
en_dg_count
=
0
for
c
in
s
:
if
c
in
string
.
ascii_letters
or
c
.
isdigit
()
or
c
.
isspace
():
en_dg_count
+=
1
elif
c
.
isalpha
():
count_zh
+=
1
else
:
count_pu
+=
1
return
s_len
-
math
.
ceil
(
en_dg_count
/
2
)
def
resize_img
(
img
,
input_size
=
600
):
img
=
np
.
array
(
img
)
im_shape
=
img
.
shape
im_size_min
=
np
.
min
(
im_shape
[
0
:
2
])
im_size_max
=
np
.
max
(
im_shape
[
0
:
2
])
im_scale
=
float
(
input_size
)
/
float
(
im_size_max
)
im
=
cv2
.
resize
(
img
,
None
,
None
,
fx
=
im_scale
,
fy
=
im_scale
)
return
im
def
get_image_ext
(
image
):
if
image
.
shape
[
2
]
==
4
:
return
".png"
return
".jpg"
def
sorted_boxes
(
dt_boxes
):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes
=
dt_boxes
.
shape
[
0
]
sorted_boxes
=
sorted
(
dt_boxes
,
key
=
lambda
x
:
(
x
[
0
][
1
],
x
[
0
][
0
]))
_boxes
=
list
(
sorted_boxes
)
for
i
in
range
(
num_boxes
-
1
):
if
abs
(
_boxes
[
i
+
1
][
0
][
1
]
-
_boxes
[
i
][
0
][
1
])
<
10
and
\
(
_boxes
[
i
+
1
][
0
][
0
]
<
_boxes
[
i
][
0
][
0
]):
tmp
=
_boxes
[
i
]
_boxes
[
i
]
=
_boxes
[
i
+
1
]
_boxes
[
i
+
1
]
=
tmp
return
_boxes
def
base64_to_cv2
(
b64str
):
data
=
base64
.
b64decode
(
b64str
.
encode
(
'utf8'
))
data
=
np
.
fromstring
(
data
,
np
.
uint8
)
data
=
cv2
.
imdecode
(
data
,
cv2
.
IMREAD_COLOR
)
return
data
modules/image/text_recognition/japan_ocr_db_crnn_mobile/README.md
0 → 100644
浏览文件 @
6ca472e5
# japan_ocr_db_crnn_mobile
|模型名称|japan_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|模型大小|8MB|
|最新更新日期|2021-04-15|
|数据指标|-|
## 一、模型基本信息
-
### 应用效果展示
-
样例结果示例:
<p
align=
"center"
>
<img
src=
"https://user-images.githubusercontent.com/22424850/133761650-91f24c1e-f437-47b1-8cfb-a074e7150ff5.jpg"
width=
'80%'
hspace=
'10'
/>
<br
/>
</p>
-
### 模型介绍
-
japan_ocr_db_crnn_mobile Module用于识别图片当中的日文。其基于chinese_text_detection_db_mobile检测得到的文本框,继续识别文本框中的日文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别日文的轻量级OCR模型,支持直接预测。
## 二、安装
-
### 1、环境依赖
-
paddlepaddle >= 1.8.0
-
paddlehub >= 1.8.0 |
[
如何安装paddlehub
](
../../../../docs/docs_ch/get_start/installation.rst
)
-
shapely
-
pyclipper
-
```shell
$ pip install shapely pyclipper
```
-
**该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
-
### 2、安装
-
```shell
$ hub install japan_ocr_db_crnn_mobile
```
-
如您安装时遇到问题,可参考:
[
零基础windows安装
](
../../../../docs/docs_ch/get_start/windows_quickstart.md
)
|
[
零基础Linux安装
](
../../../../docs/docs_ch/get_start/linux_quickstart.md
)
|
[
零基础MacOS安装
](
../../../../docs/docs_ch/get_start/mac_quickstart.md
)
## 三、模型API预测
-
### 1、命令行预测
-
```shell
$ hub run japan_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
```
-
通过命令行方式实现文字识别模型的调用,更多请见
[
PaddleHub命令行指令
](
../../../../docs/docs_ch/tutorial/cmd_usage.rst
)
-
### 2、代码示例
-
```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="japan_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
-
### 3、API
-
```python
def __init__(text_detector_module=None, enable_mkldnn=False)
```
- 构造JapanOCRDBCRNNMobile对象
- **参数**
- text_detector_module(str): 文字检测PaddleHub Module名字,如设置为None,则默认使用[chinese_text_detection_db_mobile Module](../chinese_text_detection_db_mobile/)。其作用为检测图片当中的文本。<br/>
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
-
```python
def recognize_text(images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5,
angle_classification_thresh=0.9)
```
- 预测API,检测输入图片中的所有日文文本的位置。
- **参数**
- paths (list\[str\]): 图片的路径; <br/>
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; <br/>
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** <br/>
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result; <br/>
- box\_thresh (float): 检测文本框置信度的阈值; <br/>
- text\_thresh (float): 识别日文文本置信度的阈值; <br/>
- angle_classification_thresh(float): 文本角度分类置信度的阈值 <br/>
- visualization (bool): 是否将识别结果保存为图片文件。
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标
如果无识别结果则data为\[\]
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
-
PaddleHub Serving 可以部署一个目标检测的在线服务。
-
### 第一步:启动PaddleHub Serving
-
运行启动命令:
-
```shell
$ hub serving start -m japan_ocr_db_crnn_mobile
```
-
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
-
**NOTE:**
如使用GPU预测,则需要在启动服务之前,请设置CUDA
\_
VISIBLE
\_
DEVICES环境变量,否则不用设置。
-
### 第二步:发送预测请求
-
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
-
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/japan_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
*
1.0.0
初始发布
-
```shell
$ hub install japan_ocr_db_crnn_mobile==1.0.0
```
modules/image/text_recognition/japan_ocr_db_crnn_mobile/__init__.py
0 → 100644
浏览文件 @
6ca472e5
modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan.ttc
0 → 100644
浏览文件 @
6ca472e5
文件已添加
modules/image/text_recognition/japan_ocr_db_crnn_mobile/assets/japan_dict.txt
0 → 100644
浏览文件 @
6ca472e5
此差异已折叠。
点击以展开。
modules/image/text_recognition/japan_ocr_db_crnn_mobile/character.py
0 → 100644
浏览文件 @
6ca472e5
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
numpy
as
np
import
string
class
CharacterOps
(
object
):
""" Convert between text-label and text-index """
def
__init__
(
self
,
config
):
self
.
character_type
=
config
[
'character_type'
]
self
.
loss_type
=
config
[
'loss_type'
]
self
.
max_text_len
=
config
[
'max_text_length'
]
if
self
.
character_type
==
"en"
:
self
.
character_str
=
"0123456789abcdefghijklmnopqrstuvwxyz"
dict_character
=
list
(
self
.
character_str
)
elif
self
.
character_type
in
[
"ch"
,
'japan'
,
'korean'
,
'french'
,
'german'
]:
character_dict_path
=
config
[
'character_dict_path'
]
add_space
=
False
if
'use_space_char'
in
config
:
add_space
=
config
[
'use_space_char'
]
self
.
character_str
=
""
with
open
(
character_dict_path
,
"rb"
)
as
fin
:
lines
=
fin
.
readlines
()
for
line
in
lines
:
line
=
line
.
decode
(
'utf-8'
).
strip
(
"
\n
"
).
strip
(
"
\r\n
"
)
self
.
character_str
+=
line
if
add_space
:
self
.
character_str
+=
" "
dict_character
=
list
(
self
.
character_str
)
elif
self
.
character_type
==
"en_sensitive"
:
# same with ASTER setting (use 94 char).
self
.
character_str
=
string
.
printable
[:
-
6
]
dict_character
=
list
(
self
.
character_str
)
else
:
self
.
character_str
=
None
assert
self
.
character_str
is
not
None
,
\
"Nonsupport type of the character: {}"
.
format
(
self
.
character_str
)
self
.
beg_str
=
"sos"
self
.
end_str
=
"eos"
if
self
.
loss_type
==
"attention"
:
dict_character
=
[
self
.
beg_str
,
self
.
end_str
]
+
dict_character
elif
self
.
loss_type
==
"srn"
:
dict_character
=
dict_character
+
[
self
.
beg_str
,
self
.
end_str
]
self
.
dict
=
{}
for
i
,
char
in
enumerate
(
dict_character
):
self
.
dict
[
char
]
=
i
self
.
character
=
dict_character
def
encode
(
self
,
text
):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if
self
.
character_type
==
"en"
:
text
=
text
.
lower
()
text_list
=
[]
for
char
in
text
:
if
char
not
in
self
.
dict
:
continue
text_list
.
append
(
self
.
dict
[
char
])
text
=
np
.
array
(
text_list
)
return
text
def
decode
(
self
,
text_index
,
is_remove_duplicate
=
False
):
""" convert text-index into text-label. """
char_list
=
[]
char_num
=
self
.
get_char_num
()
if
self
.
loss_type
==
"attention"
:
beg_idx
=
self
.
get_beg_end_flag_idx
(
"beg"
)
end_idx
=
self
.
get_beg_end_flag_idx
(
"end"
)
ignored_tokens
=
[
beg_idx
,
end_idx
]
else
:
ignored_tokens
=
[
char_num
]
for
idx
in
range
(
len
(
text_index
)):
if
text_index
[
idx
]
in
ignored_tokens
:
continue
if
is_remove_duplicate
:
if
idx
>
0
and
text_index
[
idx
-
1
]
==
text_index
[
idx
]:
continue
char_list
.
append
(
self
.
character
[
int
(
text_index
[
idx
])])
text
=
''
.
join
(
char_list
)
return
text
def
get_char_num
(
self
):
return
len
(
self
.
character
)
def
get_beg_end_flag_idx
(
self
,
beg_or_end
):
if
self
.
loss_type
==
"attention"
:
if
beg_or_end
==
"beg"
:
idx
=
np
.
array
(
self
.
dict
[
self
.
beg_str
])
elif
beg_or_end
==
"end"
:
idx
=
np
.
array
(
self
.
dict
[
self
.
end_str
])
else
:
assert
False
,
"Unsupport type %s in get_beg_end_flag_idx"
\
%
beg_or_end
return
idx
else
:
err
=
"error in get_beg_end_flag_idx when using the loss %s"
\
%
(
self
.
loss_type
)
assert
False
,
err
def
cal_predicts_accuracy
(
char_ops
,
preds
,
preds_lod
,
labels
,
labels_lod
,
is_remove_duplicate
=
False
):
acc_num
=
0
img_num
=
0
for
ino
in
range
(
len
(
labels_lod
)
-
1
):
beg_no
=
preds_lod
[
ino
]
end_no
=
preds_lod
[
ino
+
1
]
preds_text
=
preds
[
beg_no
:
end_no
].
reshape
(
-
1
)
preds_text
=
char_ops
.
decode
(
preds_text
,
is_remove_duplicate
)
beg_no
=
labels_lod
[
ino
]
end_no
=
labels_lod
[
ino
+
1
]
labels_text
=
labels
[
beg_no
:
end_no
].
reshape
(
-
1
)
labels_text
=
char_ops
.
decode
(
labels_text
,
is_remove_duplicate
)
img_num
+=
1
if
preds_text
==
labels_text
:
acc_num
+=
1
acc
=
acc_num
*
1.0
/
img_num
return
acc
,
acc_num
,
img_num
def
cal_predicts_accuracy_srn
(
char_ops
,
preds
,
labels
,
max_text_len
,
is_debug
=
False
):
acc_num
=
0
img_num
=
0
char_num
=
char_ops
.
get_char_num
()
total_len
=
preds
.
shape
[
0
]
img_num
=
int
(
total_len
/
max_text_len
)
for
i
in
range
(
img_num
):
cur_label
=
[]
cur_pred
=
[]
for
j
in
range
(
max_text_len
):
if
labels
[
j
+
i
*
max_text_len
]
!=
int
(
char_num
-
1
):
#0
cur_label
.
append
(
labels
[
j
+
i
*
max_text_len
][
0
])
else
:
break
for
j
in
range
(
max_text_len
+
1
):
if
j
<
len
(
cur_label
)
and
preds
[
j
+
i
*
max_text_len
][
0
]
!=
cur_label
[
j
]:
break
elif
j
==
len
(
cur_label
)
and
j
==
max_text_len
:
acc_num
+=
1
break
elif
j
==
len
(
cur_label
)
and
preds
[
j
+
i
*
max_text_len
][
0
]
==
int
(
char_num
-
1
):
acc_num
+=
1
break
acc
=
acc_num
*
1.0
/
img_num
return
acc
,
acc_num
,
img_num
def
convert_rec_attention_infer_res
(
preds
):
img_num
=
preds
.
shape
[
0
]
target_lod
=
[
0
]
convert_ids
=
[]
for
ino
in
range
(
img_num
):
end_pos
=
np
.
where
(
preds
[
ino
,
:]
==
1
)[
0
]
if
len
(
end_pos
)
<=
1
:
text_list
=
preds
[
ino
,
1
:]
else
:
text_list
=
preds
[
ino
,
1
:
end_pos
[
1
]]
target_lod
.
append
(
target_lod
[
ino
]
+
len
(
text_list
))
convert_ids
=
convert_ids
+
list
(
text_list
)
convert_ids
=
np
.
array
(
convert_ids
)
convert_ids
=
convert_ids
.
reshape
((
-
1
,
1
))
return
convert_ids
,
target_lod
def
convert_rec_label_to_lod
(
ori_labels
):
img_num
=
len
(
ori_labels
)
target_lod
=
[
0
]
convert_ids
=
[]
for
ino
in
range
(
img_num
):
target_lod
.
append
(
target_lod
[
ino
]
+
len
(
ori_labels
[
ino
]))
convert_ids
=
convert_ids
+
list
(
ori_labels
[
ino
])
convert_ids
=
np
.
array
(
convert_ids
)
convert_ids
=
convert_ids
.
reshape
((
-
1
,
1
))
return
convert_ids
,
target_lod
modules/image/text_recognition/japan_ocr_db_crnn_mobile/module.py
0 → 100644
浏览文件 @
6ca472e5
此差异已折叠。
点击以展开。
modules/image/text_recognition/japan_ocr_db_crnn_mobile/utils.py
0 → 100644
浏览文件 @
6ca472e5
# -*- coding:utf-8 -*-
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
math
from
PIL
import
Image
,
ImageDraw
,
ImageFont
import
base64
import
cv2
import
numpy
as
np
def
draw_ocr
(
image
,
boxes
,
txts
,
scores
,
font_file
,
draw_txt
=
True
,
drop_score
=
0.5
):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if
scores
is
None
:
scores
=
[
1
]
*
len
(
boxes
)
for
(
box
,
score
)
in
zip
(
boxes
,
scores
):
if
score
<
drop_score
or
math
.
isnan
(
score
):
continue
box
=
np
.
reshape
(
np
.
array
(
box
),
[
-
1
,
1
,
2
]).
astype
(
np
.
int64
)
image
=
cv2
.
polylines
(
np
.
array
(
image
),
[
box
],
True
,
(
255
,
0
,
0
),
2
)
if
draw_txt
:
img
=
np
.
array
(
resize_img
(
image
,
input_size
=
600
))
txt_img
=
text_visual
(
txts
,
scores
,
font_file
,
img_h
=
img
.
shape
[
0
],
img_w
=
600
,
threshold
=
drop_score
)
img
=
np
.
concatenate
([
np
.
array
(
img
),
np
.
array
(
txt_img
)],
axis
=
1
)
return
img
return
image
def
text_visual
(
texts
,
scores
,
font_file
,
img_h
=
400
,
img_w
=
600
,
threshold
=
0.
):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if
scores
is
not
None
:
assert
len
(
texts
)
==
len
(
scores
),
"The number of txts and corresponding scores must match"
def
create_blank_img
():
blank_img
=
np
.
ones
(
shape
=
[
img_h
,
img_w
],
dtype
=
np
.
int8
)
*
255
blank_img
[:,
img_w
-
1
:]
=
0
blank_img
=
Image
.
fromarray
(
blank_img
).
convert
(
"RGB"
)
draw_txt
=
ImageDraw
.
Draw
(
blank_img
)
return
blank_img
,
draw_txt
blank_img
,
draw_txt
=
create_blank_img
()
font_size
=
20
txt_color
=
(
0
,
0
,
0
)
font
=
ImageFont
.
truetype
(
font_file
,
font_size
,
encoding
=
"utf-8"
)
gap
=
font_size
+
5
txt_img_list
=
[]
count
,
index
=
1
,
0
for
idx
,
txt
in
enumerate
(
texts
):
index
+=
1
if
scores
[
idx
]
<
threshold
or
math
.
isnan
(
scores
[
idx
]):
index
-=
1
continue
first_line
=
True
while
str_count
(
txt
)
>=
img_w
//
font_size
-
4
:
tmp
=
txt
txt
=
tmp
[:
img_w
//
font_size
-
4
]
if
first_line
:
new_txt
=
str
(
index
)
+
': '
+
txt
first_line
=
False
else
:
new_txt
=
' '
+
txt
draw_txt
.
text
((
0
,
gap
*
count
),
new_txt
,
txt_color
,
font
=
font
)
txt
=
tmp
[
img_w
//
font_size
-
4
:]
if
count
>=
img_h
//
gap
-
1
:
txt_img_list
.
append
(
np
.
array
(
blank_img
))
blank_img
,
draw_txt
=
create_blank_img
()
count
=
0
count
+=
1
if
first_line
:
new_txt
=
str
(
index
)
+
': '
+
txt
+
' '
+
'%.3f'
%
(
scores
[
idx
])
else
:
new_txt
=
" "
+
txt
+
" "
+
'%.3f'
%
(
scores
[
idx
])
draw_txt
.
text
((
0
,
gap
*
count
),
new_txt
,
txt_color
,
font
=
font
)
# whether add new blank img or not
if
count
>=
img_h
//
gap
-
1
and
idx
+
1
<
len
(
texts
):
txt_img_list
.
append
(
np
.
array
(
blank_img
))
blank_img
,
draw_txt
=
create_blank_img
()
count
=
0
count
+=
1
txt_img_list
.
append
(
np
.
array
(
blank_img
))
if
len
(
txt_img_list
)
==
1
:
blank_img
=
np
.
array
(
txt_img_list
[
0
])
else
:
blank_img
=
np
.
concatenate
(
txt_img_list
,
axis
=
1
)
return
np
.
array
(
blank_img
)
def
str_count
(
s
):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import
string
count_zh
=
count_pu
=
0
s_len
=
len
(
s
)
en_dg_count
=
0
for
c
in
s
:
if
c
in
string
.
ascii_letters
or
c
.
isdigit
()
or
c
.
isspace
():
en_dg_count
+=
1
elif
c
.
isalpha
():
count_zh
+=
1
else
:
count_pu
+=
1
return
s_len
-
math
.
ceil
(
en_dg_count
/
2
)
def
resize_img
(
img
,
input_size
=
600
):
img
=
np
.
array
(
img
)
im_shape
=
img
.
shape
im_size_min
=
np
.
min
(
im_shape
[
0
:
2
])
im_size_max
=
np
.
max
(
im_shape
[
0
:
2
])
im_scale
=
float
(
input_size
)
/
float
(
im_size_max
)
im
=
cv2
.
resize
(
img
,
None
,
None
,
fx
=
im_scale
,
fy
=
im_scale
)
return
im
def
get_image_ext
(
image
):
if
image
.
shape
[
2
]
==
4
:
return
".png"
return
".jpg"
def
sorted_boxes
(
dt_boxes
):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes
=
dt_boxes
.
shape
[
0
]
sorted_boxes
=
sorted
(
dt_boxes
,
key
=
lambda
x
:
(
x
[
0
][
1
],
x
[
0
][
0
]))
_boxes
=
list
(
sorted_boxes
)
for
i
in
range
(
num_boxes
-
1
):
if
abs
(
_boxes
[
i
+
1
][
0
][
1
]
-
_boxes
[
i
][
0
][
1
])
<
10
and
\
(
_boxes
[
i
+
1
][
0
][
0
]
<
_boxes
[
i
][
0
][
0
]):
tmp
=
_boxes
[
i
]
_boxes
[
i
]
=
_boxes
[
i
+
1
]
_boxes
[
i
+
1
]
=
tmp
return
_boxes
def
base64_to_cv2
(
b64str
):
data
=
base64
.
b64decode
(
b64str
.
encode
(
'utf8'
))
data
=
np
.
fromstring
(
data
,
np
.
uint8
)
data
=
cv2
.
imdecode
(
data
,
cv2
.
IMREAD_COLOR
)
return
data
modules/thirdparty/image/text_recognition/Vehicle_License_Plate_Recognition/README.md
浏览文件 @
6ca472e5
## 概述
Vehicle_License_Plate_Recognition 是一个基于 CCPD 数据集训练的车牌识别模型,能够检测出图像中车牌位置并识别其中的车牌文字信息,大致的模型结构如下,分为检测车牌和文字识别两个模块:
# Vehicle_License_Plate_Recognition
![](
https://ai-studio-static-online.cdn.bcebos.com/35a3dab32ac948549de41afba7b51a5770d3f872d60b437d891f359a5cef8052
)
|模型名称|Vehicle_License_Plate_Recognition|
| :--- | :---: |
|类别|图像 - 文字识别|
|网络|-|
|数据集|CCPD|
|是否支持Fine-tuning|否|
|模型大小|111MB|
|最新更新日期|2021-03-22|
|数据指标|-|
## API
```
python
def
plate_recognition
(
images
)
```
车牌识别 API
**参数**
*
images(str / ndarray / list(str) / list(ndarray)):待识别图像的路径或者图像的 Ndarray(RGB)
## 一、模型基本信息
**返回**
*
results(list(dict{'license', 'bbox'})): 识别到的车牌信息列表,包含车牌的位置坐标和车牌号码
-
### 应用效果展示
-
样例结果示例:
<p
align=
"center"
>
<img
src=
"https://ai-studio-static-online.cdn.bcebos.com/35a3dab32ac948549de41afba7b51a5770d3f872d60b437d891f359a5cef8052"
width =
"450"
height =
"300"
hspace=
'10'
/>
<br
/>
</p>
**代码示例**
```
python
import
paddlehub
as
hub
-
### 模型介绍
# 加载模型
model
=
hub
.
Module
(
name
=
'Vehicle_License_Plate_Recognition'
)
-
Vehicle_License_Plate_Recognition是一个基于CCPD数据集训练的车牌识别模型,能够检测出图像中车牌位置并识别其中的车牌文字信息。
# 车牌识别
result
=
model
.
plate_recognition
(
"test.jpg"
)
# 打印结果
print
(
result
)
```
[{'license': '苏B92912', 'bbox': [[131.0, 251.0], [368.0, 253.0], [367.0, 338.0], [131.0, 336.0]]}]
## 二、安装
## 服务部署
-
### 1、环境依赖
PaddleHub Serving 可以部署一个在线车牌识别服务。
-
paddlepaddle >= 2.0.0
## 第一步:启动PaddleHub Serving
-
paddlehub >= 2.0.4
运行启动命令:
```
shell
$
hub serving start
--modules
Vehicle_License_Plate_Recognition
```
-
paddleocr >= 2.0.2
这样就完成了一个车牌识别的在线服务API的部署,默认端口号为8866。
-
### 2、安装
**NOTE:**
如使用GPU预测,则需要在启动服务之前,请设置CUDA
\_
VISIBLE
\_
DEVICES环境变量,否则不用设置。
-
```shell
$ hub install Vehicle_License_Plate_Recognition
```
## 三、模型API预测
## 第二步:发送预测请求
-
### 1、代码示例
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
-
```python
import paddlehub as hub
import cv2
```
python
import
requests
import
json
import
cv2
import
base64
model = hub.Module(name="Vehicle_License_Plate_Recognition")
result = model.plate_recognition(images=[cv2.imread('/PATH/TO/IMAGE')])
```
-
### 2、API
def
cv2_to_base64
(
image
):
d
ata
=
cv2
.
imencode
(
'.jpg'
,
image
)[
1
]
return
base64
.
b64encode
(
data
.
tostring
()).
decode
(
'utf8'
)
-
```python
d
ef plate_recognition(images)
```
- 车牌识别 API。
# 发送HTTP请求
data
=
{
'images'
:[
cv2_to_base64
(
cv2
.
imread
(
"test.jpg"
))]}
headers
=
{
"Content-type"
:
"application/json"
}
url
=
"http://127.0.0.1:8866/predict/Vehicle_License_Plate_Recognition"
r
=
requests
.
post
(
url
=
url
,
headers
=
headers
,
data
=
json
.
dumps
(
data
))
- **参数**
# 打印预测结果
print
(
r
.
json
()[
"results"
])
```
[{'bbox': [[260.0, 100.0], [546.0, 104.0], [544.0, 200.0], [259.0, 196.0]], 'license': '苏DS0000'}]
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];<br/>
- **返回**
- results(list(dict{'license', 'bbox'})): 识别到的车牌信息列表,包含车牌的位置坐标和车牌号码
## 查看代码
https://github.com/jm12138/License_plate_recognition
## 依赖
paddlepaddle >= 2.0.0
## 四、服务部署
paddlehub >= 2.0.4
-
PaddleHub Serving可以部署一个在线车牌识别服务。
paddleocr >= 2.0.2
-
### 第一步:启动PaddleHub Serving
-
运行启动命令:
-
```shell
$ hub serving start -m Vehicle_License_Plate_Recognition
```
-
这样就完成了一个车牌识别的在线服务API的部署,默认端口号为8866。
-
**NOTE:**
如使用GPU预测,则需要在启动服务之前,请设置CUDA
\_
VISIBLE
\_
DEVICES环境变量,否则不用设置。
-
### 第二步:发送预测请求
-
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
-
```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/Vehicle_License_Plate_Recognition"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
*
1.0.0
初始发布
-
```shell
$ hub install Vehicle_License_Plate_Recognition==1.0.0
```
\ No newline at end of file
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录