提交 aab69020 编写于 作者: Y Yuantao Feng 提交者: GitHub

Add CRNN_CN for recognizing Chinese (#23)

* add charsets and CRNN_CN

* workable wrapper and demo for CRNN_CN

* update README with CRNN_CN; rename models with EN/CN suffix

* impl benchmarks for CRNN_EN and CRNN_CN

* update benchmark result on arm

* update benchmark results on x86_64 & cuda
上级 df036f69
......@@ -19,7 +19,8 @@ Guidelines:
| [YuNet](./models/face_detection_yunet) | 160x120 | 1.45 | 6.22 | 12.18 |
| [DB-IC15](./models/text_detection_db) | 640x480 | 142.91 | 2835.91 | 208.41 |
| [DB-TD500](./models/text_detection_db) | 640x480 | 142.91 | 2841.71 | 210.51 |
| [CRNN](./models/text_recognition_crnn) | 100x32 | 50.21 | 234.32 | 196.15 |
| [CRNN-EN](./models/text_recognition_crnn) | 100x32 | 50.21 | 234.32 | 196.15 |
| [CRNN-CN](./models/text_recognition_crnn) | 100x32 | 73.52 | 322.16 | 239.76 |
| [SFace](./models/face_recognition_sface) | 112x112 | 8.65 | 99.20 | 24.88 |
| [PP-ResNet](./models/image_classification_ppresnet) | 224x224 | 56.05 | 602.58 | 98.64 |
| [PP-HumanSeg](./models/human_segmentation_pphumanseg) | 192x192 | 19.92 | 105.32 | 67.97 |
......
Benchmark:
name: "Text Recognition Benchmark"
type: "Recognition"
data:
path: "benchmark/data/text"
files: ["1.jpg", "2.jpg", "3.jpg"]
metric: # 'sizes' is omitted since this model requires input of fixed size
warmup: 30
repeat: 10
reduction: "median"
backend: "default"
target: "cpu"
Model:
name: "CRNN"
modelPath: "models/text_recognition_crnn/text_recognition_CRNN_CN_2021nov.onnx"
charsetPath: "models/text_recognition_crnn/charset_3944_CN.txt"
\ No newline at end of file
......@@ -13,4 +13,5 @@ Benchmark:
Model:
name: "CRNN"
modelPath: "models/text_recognition_crnn/text_recognition_CRNN_VGG_BiLSTM_CTC_2021sep.onnx"
\ No newline at end of file
modelPath: "models/text_recognition_crnn/text_recognition_CRNN_EN_2021sep.onnx"
charsetPath: "models/text_recognition_crnn/charset_36_EN.txt"
\ No newline at end of file
......@@ -3,14 +3,22 @@
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Note:
- Model source: https://docs.opencv.org/4.5.2/d9/d1e/tutorial_dnn_OCR.html.
- For details on training this model, please visit https://github.com/zihaomu/deep-text-recognition-benchmark, which can only recognize english words.
- Model source:
- `text_recognition_CRNN_EN_2021sep.onnx`: https://docs.opencv.org/4.5.2/d9/d1e/tutorial_dnn_OCR.html (CRNN_VGG_BiLSTM_CTC.onnx)
- `text_recognition_CRNN_CN_2021nov.onnx`: https://docs.opencv.org/4.5.2/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs_CN.onnx)
- `text_recognition_CRNN_EN_2021sep.onnx` can detect digits (0~9) and letters (return lowercase letters a~z) (view `charset_36_EN.txt` for details).
- `text_recognition_CRNN_CN_2021nov.onnx` can detect digits (0~9), upper/lower-case letters (a~z and A~Z), some Chinese characters and some special characters (view `charset_3944_CN.txt` for details).
- For details on training this model series, please visit https://github.com/zihaomu/deep-text-recognition-benchmark.
## Demo
***NOTE***: This demo uses [text_detection_db](../text_detection_db) as text detector.
***NOTE***:
- This demo uses [text_detection_db](../text_detection_db) as text detector.
- Selected model must match with the charset:
- Try `text_recognition_CRNN_EN_2021sep.onnx` with `charset_36_EN.txt`.
- Try `text_recognition_CRNN_CN_2021sep.onnx` with `charset_3944_CN.txt`.
Run the following command to try the demo:
Run the demo detecting English:
```shell
# detect on camera input
python demo.py
......@@ -18,6 +26,14 @@ python demo.py
python demo.py --input /path/to/image
```
Run the demo detecting Chinese:
```shell
# detect on camera input
python demo.py --model text_recognition_CRNN_CN_2021nov.onnx --charset charset_3944_CN.txt
# detect on an image
python demo.py --input /path/to/image --model text_recognition_CRNN_CN_2021nov.onnx --charset charset_3944_CN.txt
```
## License
All files in this directory are licensed under [Apache 2.0 License](./LICENSE).
......
0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
H
O
K
I
T
E
A
J
N
G
Y
C
U
Q
1
7
3
9
8
线
便
4
0
-
6
5
2
西
l
i
k
n
g
c
o
f
e
w
h
t
L
:
.
B
/
绿
d
D
y
F
r
P
a
u
M
R
S
®
鱿
*
V
X
W
j
Z
·
x
v
s
m
广
p
&
b
'
寿
|
仿
é
[
]
z
怀
ǐ
à
¥
©
ä
°
_
饿
!
鹿
屿
粿
q
滿
(
耀
漿
,
槿
%
<
>
宿
+
#
~
=
)
\
×
ā
¦
@
椿
亿
鸿
Λ
访
綿
Á
ō
驿
­
ǒ
穿
使
殿
尿
廿
Ξ
φ
á
ǎ
а
伿
湿
祿
稿
û
í
ó
Θ
{
π
`
姿
ī
ò
"
ē
退
ε
ě
}
ǔ
è
´
Ē
?
ʌ
É
齿
鴿
ú
ˊ
$
;
^
......@@ -8,8 +8,10 @@ import numpy as np
import cv2 as cv
class CRNN:
def __init__(self, modelPath):
self._model = cv.dnn.readNet(modelPath)
def __init__(self, modelPath, charsetPath):
self._model_path = modelPath
self._model = cv.dnn.readNet(self._model_path)
self._charset = self._load_charset(charsetPath)
self._inputSize = [100, 32] # Fixed
self._targetVertices = np.array([
[0, self._inputSize[1] - 1],
......@@ -22,6 +24,14 @@ class CRNN:
def name(self):
return self.__class__.__name__
def _load_charset(self, charsetPath):
charset = ''
with open(charsetPath, 'r') as f:
for char in f:
char = char.strip()
charset += char
return charset
def setBackend(self, backend_id):
self._model.setPreferableBackend(backend_id)
......@@ -35,7 +45,10 @@ class CRNN:
rotationMatrix = cv.getPerspectiveTransform(vertices, self._targetVertices)
cropped = cv.warpPerspective(image, rotationMatrix, self._inputSize)
cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)
if 'CN' in self._model_path:
pass
else:
cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)
return cv.dnn.blobFromImage(cropped, size=self._inputSize, mean=127.5, scalefactor=1 / 127.5)
......@@ -55,12 +68,11 @@ class CRNN:
def _postprocess(self, outputBlob):
'''Decode charaters from outputBlob
'''
text = ""
alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
text = ''
for i in range(outputBlob.shape[0]):
c = np.argmax(outputBlob[i][0])
if c != 0:
text += alphabet[c - 1]
text += self._charset[c - 1]
else:
text += '-'
......
......@@ -26,11 +26,8 @@ def str2bool(v):
parser = argparse.ArgumentParser(
description="An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition (https://arxiv.org/abs/1507.05717)")
parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.')
parser.add_argument('--model', '-m', type=str, default='text_recognition_CRNN_VGG_BiLSTM_CTC_2021sep.onnx', help='Path to the model.')
parser.add_argument('--width', type=int, default=736,
help='The width of input image being sent to the text detector.')
parser.add_argument('--height', type=int, default=736,
help='The height of input image being sent to the text detector.')
parser.add_argument('--model', '-m', type=str, default='text_recognition_CRNN_EN_2021sep.onnx', help='Path to the model.')
parser.add_argument('--charset', '-c', type=str, default='charset_36_EN.txt', help='Path to the charset file corresponding to the selected model.')
parser.add_argument('--save', '-s', type=str, default=False, help='Set true to save results. This flag is invalid when using camera.')
parser.add_argument('--vis', '-v', type=str2bool, default=True, help='Set true to open a window for result visualization. This flag is invalid when using camera.')
args = parser.parse_args()
......@@ -46,10 +43,10 @@ def visualize(image, boxes, texts, color=(0, 255, 0), isClosed=True, thickness=2
if __name__ == '__main__':
# Instantiate CRNN for text recognition
recognizer = CRNN(modelPath=args.model)
recognizer = CRNN(modelPath=args.model, charsetPath=args.charset)
# Instantiate DB for text detection
detector = DB(modelPath='../text_detection_db/text_detection_DB_IC15_resnet18_2021sep.onnx',
inputSize=[args.width, args.height],
inputSize=[736, 736],
binaryThreshold=0.3,
polygonThreshold=0.5,
maxCandidates=200,
......@@ -93,32 +90,32 @@ if __name__ == '__main__':
print('No frames grabbed!')
break
frame = cv.resize(frame, [args.width, args.height])
frame = cv.resize(frame, [736, 736])
# Inference of text detector
tm.start()
results = detector.infer(frame)
tm.stop()
latency_detector = tm.getFPS()
cv.putText(frame, 'Latency - {}: {:.2f}'.format(detector.name, tm.getFPS()), (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
tm.reset()
# Inference of text recognizer
texts = []
tm.start()
for box, score in zip(results[0], results[1]):
result = np.hstack(
(box.reshape(8), score)
)
texts.append(
recognizer.infer(frame, result)
)
tm.stop()
latency_recognizer = tm.getFPS()
tm.reset()
# Draw results on the input image
frame = visualize(frame, results, texts)
cv.putText(frame, 'Latency - {}: {}'.format(detector.name, latency_detector), (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
cv.putText(frame, 'Latency - {}: {}'.format(recognizer.name, latency_recognizer), (0, 30), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
# Inference of text recognizer
if len(results[0]) and len(results[1]):
texts = []
tm.start()
for box, score in zip(results[0], results[1]):
result = np.hstack(
(box.reshape(8), score)
)
texts.append(
recognizer.infer(frame, box.reshape(8))
)
tm.stop()
cv.putText(frame, 'Latency - {}: {:.2f}'.format(recognizer.name, tm.getFPS()), (0, 30), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
tm.reset()
# Draw results on the input image
frame = visualize(frame, results, texts)
print(texts)
# Visualize results in a new Window
cv.imshow('{} Demo'.format(recognizer.name), frame)
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册