Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleOCR
提交
edeb12b1
P
PaddleOCR
项目概览
PaddlePaddle
/
PaddleOCR
大约 1 年 前同步成功
通知
1528
Star
32962
Fork
6643
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
108
列表
看板
标记
里程碑
合并请求
7
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleOCR
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
108
Issue
108
列表
看板
标记
里程碑
合并请求
7
合并请求
7
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
edeb12b1
编写于
1月 26, 2021
作者:
T
tink2123
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
rename en_sensitive EN_symbol
上级
d9ae86f4
变更
5
显示空白变更内容
内联
并排
Showing
5 changed file
with
18 addition
and
18 deletion
+18
-18
configs/rec/multi_language/rec_en_number_lite_train.yml
configs/rec/multi_language/rec_en_number_lite_train.yml
+8
-8
doc/doc_ch/recognition.md
doc/doc_ch/recognition.md
+1
-1
doc/doc_en/recognition_en.md
doc/doc_en/recognition_en.md
+1
-1
ppocr/data/imaug/label_ops.py
ppocr/data/imaug/label_ops.py
+4
-4
ppocr/postprocess/rec_postprocess.py
ppocr/postprocess/rec_postprocess.py
+4
-4
未找到文件。
configs/rec/multi_language/rec_en_number_lite_train.yml
浏览文件 @
edeb12b1
Global
:
use_gpu
:
Tru
e
use_gpu
:
Fals
e
epoch_num
:
500
log_smooth_window
:
20
print_batch_step
:
10
...
...
@@ -16,7 +16,7 @@ Global:
infer_img
:
# for data or label process
character_dict_path
:
ppocr/utils/dict/en_dict.txt
character_type
:
E
n
character_type
:
E
N
max_text_length
:
25
infer_mode
:
False
use_space_char
:
False
...
...
@@ -63,8 +63,8 @@ Metric:
Train
:
dataset
:
name
:
SimpleDataSet
data_dir
:
./train_data/
label_file_list
:
[
"
./train_data/
train_li
st.txt"
]
data_dir
:
./train_data/
ic15_data/
label_file_list
:
[
"
./train_data/
ic15_data/rec_gt_te
st.txt"
]
transforms
:
-
DecodeImage
:
# load image
img_mode
:
BGR
...
...
@@ -77,15 +77,15 @@ Train:
keep_keys
:
[
'
image'
,
'
label'
,
'
length'
]
# dataloader will return list in this order
loader
:
shuffle
:
True
batch_size_per_card
:
256
batch_size_per_card
:
1
drop_last
:
True
num_workers
:
8
num_workers
:
1
Eval
:
dataset
:
name
:
SimpleDataSet
data_dir
:
./train_data/
label_file_list
:
[
"
./train_data/
eval_li
st.txt"
]
data_dir
:
./train_data/
ic15_data/
label_file_list
:
[
"
./train_data/
ic15_data/rec_gt_te
st.txt"
]
transforms
:
-
DecodeImage
:
# load image
img_mode
:
BGR
...
...
doc/doc_ch/recognition.md
浏览文件 @
edeb12b1
...
...
@@ -348,7 +348,7 @@ PaddleOCR目前已支持26种(除中文外)语种识别,`configs/rec/multi
| 配置文件 | 算法名称 | backbone | trans | seq | pred | language | character_type |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: |
| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 中文繁体 | chinese_cht|
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语
| En
|
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 英语
(区分大小写) | EN
|
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 法语 | french |
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 德语 | german |
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | 日语 | japan |
...
...
doc/doc_en/recognition_en.md
浏览文件 @
edeb12b1
...
...
@@ -350,7 +350,7 @@ Currently, the multi-language algorithms supported by PaddleOCR are:
| Configuration file | Algorithm name | backbone | trans | seq | pred | language | character_type |
| :--------: | :-------: | :-------: | :-------: | :-----: | :-----: | :-----: | :-----: |
| rec_chinese_cht_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | chinese traditional | chinese_cht|
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English
| En
|
| rec_en_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | English
(Case sensitive) | EN
|
| rec_french_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | French | french |
| rec_ger_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | German | german |
| rec_japan_lite_train.yml | CRNN | Mobilenet_v3 small 0.5 | None | BiLSTM | ctc | Japanese | japan |
...
...
ppocr/data/imaug/label_ops.py
浏览文件 @
edeb12b1
...
...
@@ -18,6 +18,7 @@ from __future__ import print_function
from
__future__
import
unicode_literals
import
numpy
as
np
import
string
class
ClsLabelEncode
(
object
):
...
...
@@ -92,8 +93,8 @@ class BaseRecLabelEncode(object):
character_type
=
'ch'
,
use_space_char
=
False
):
support_character_type
=
[
'ch'
,
'en'
,
'
en_sensitive
'
,
'french'
,
'german'
,
'japan'
,
'korean'
,
'E
n
'
,
'it'
,
'xi'
,
'pu'
,
'ru'
,
'ar'
,
'ta'
,
'ug'
,
'fa'
,
'ur'
,
'rs'
,
'ch'
,
'en'
,
'
EN_symbol
'
,
'french'
,
'german'
,
'japan'
,
'korean'
,
'E
N
'
,
'it'
,
'xi'
,
'pu'
,
'ru'
,
'ar'
,
'ta'
,
'ug'
,
'fa'
,
'ur'
,
'rs'
,
'oc'
,
'rsc'
,
'bg'
,
'uk'
,
'be'
,
'te'
,
'ka'
,
'chinese_cht'
,
'hi'
,
'mr'
,
'ne'
]
...
...
@@ -104,9 +105,8 @@ class BaseRecLabelEncode(object):
if
character_type
==
"en"
:
self
.
character_str
=
"0123456789abcdefghijklmnopqrstuvwxyz"
dict_character
=
list
(
self
.
character_str
)
elif
character_type
==
"
en_sensitive
"
:
elif
character_type
==
"
EN_symbol
"
:
# same with ASTER setting (use 94 char).
import
string
self
.
character_str
=
string
.
printable
[:
-
6
]
dict_character
=
list
(
self
.
character_str
)
elif
character_type
in
support_character_type
:
...
...
ppocr/postprocess/rec_postprocess.py
浏览文件 @
edeb12b1
...
...
@@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import
numpy
as
np
import
string
import
paddle
from
paddle.nn
import
functional
as
F
...
...
@@ -24,10 +25,10 @@ class BaseRecLabelDecode(object):
character_type
=
'ch'
,
use_space_char
=
False
):
support_character_type
=
[
'ch'
,
'en'
,
'
en_sensitive
'
,
'french'
,
'german'
,
'japan'
,
'korean'
,
'ch'
,
'en'
,
'
EN_symbol
'
,
'french'
,
'german'
,
'japan'
,
'korean'
,
'it'
,
'xi'
,
'pu'
,
'ru'
,
'ar'
,
'ta'
,
'ug'
,
'fa'
,
'ur'
,
'rs'
,
'oc'
,
'rsc'
,
'bg'
,
'uk'
,
'be'
,
'te'
,
'ka'
,
'chinese_cht'
,
'hi'
,
'mr'
,
'ne'
,
'E
n
'
'ne'
,
'E
N
'
]
assert
character_type
in
support_character_type
,
"Only {} are supported now but get {}"
.
format
(
support_character_type
,
character_type
)
...
...
@@ -35,9 +36,8 @@ class BaseRecLabelDecode(object):
if
character_type
==
"en"
:
self
.
character_str
=
"0123456789abcdefghijklmnopqrstuvwxyz"
dict_character
=
list
(
self
.
character_str
)
elif
character_type
==
"
en_sensitive
"
:
elif
character_type
==
"
EN_symbol
"
:
# same with ASTER setting (use 94 char).
import
string
self
.
character_str
=
string
.
printable
[:
-
6
]
dict_character
=
list
(
self
.
character_str
)
elif
character_type
in
support_character_type
:
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录