Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
a7660331a
tesseract
提交
f9277281
T
tesseract
项目概览
a7660331a
/
tesseract
与 Fork 源项目一致
从无法访问的项目Fork
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
T
tesseract
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
f9277281
编写于
10月 09, 2014
作者:
R
Ray Smith
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Fixed issue 1207
上级
d0cb1071
变更
4
显示空白变更内容
内联
并排
Showing
4 changed file
with
49 addition
and
7 deletion
+49
-7
ccmain/tesseractclass.cpp
ccmain/tesseractclass.cpp
+25
-4
ccmain/tesseractclass.h
ccmain/tesseractclass.h
+8
-1
ccutil/unicharset.cpp
ccutil/unicharset.cpp
+12
-1
ccutil/unicharset.h
ccutil/unicharset.h
+4
-1
未找到文件。
ccmain/tesseractclass.cpp
浏览文件 @
f9277281
///////////////////////////////////////////////////////////////////////
// File: tesseractclass.cpp
// Description: An instance of Tesseract. For thread safety, *every*
// global variable goes in here, directly, or indirectly.
// Description: The Tesseract class. It holds/owns everything needed
// to run Tesseract on a single language, and also a set of
// sub-Tesseracts to run sub-languages. For thread safety, *every*
// variable that was previously global or static (except for
// constant data, and some visual debugging flags) has been moved
// in here, directly, or indirectly.
// This makes it safe to run multiple Tesseracts in different
// threads in parallel, and keeps the different language
// instances separate.
// Some global functions remain, but they are isolated re-entrant
// functions that operate on their arguments. Functions that work
// on variable data have been moved to an appropriate class based
// mostly on the directory hierarchy. For more information see
// slide 6 of "2ArchitectureAndDataStructures" in
// https://drive.google.com/file/d/0B7l10Bj_LprhbUlIUFlCdGtDYkE/edit?usp=sharing
// Some global data and related functions still exist in the
// training-related code, but they don't interfere with normal
// recognition operation.
// Author: Ray Smith
// Created: Fri Mar 07 08:17:01 PST 2008
//
...
...
@@ -65,6 +81,9 @@ Tesseract::Tesseract()
"Blacklist of chars not to recognize"
,
this
->
params
()),
STRING_MEMBER
(
tessedit_char_whitelist
,
""
,
"Whitelist of chars to recognize"
,
this
->
params
()),
STRING_MEMBER
(
tessedit_char_unblacklist
,
""
,
"List of chars to override tessedit_char_blacklist"
,
this
->
params
()),
BOOL_MEMBER
(
tessedit_ambigs_training
,
false
,
"Perform training for ambiguities"
,
this
->
params
()),
INT_MEMBER
(
pageseg_devanagari_split_strategy
,
...
...
@@ -578,11 +597,13 @@ void Tesseract::ResetDocumentDictionary() {
void
Tesseract
::
SetBlackAndWhitelist
()
{
// Set the white and blacklists (if any)
unicharset
.
set_black_and_whitelist
(
tessedit_char_blacklist
.
string
(),
tessedit_char_whitelist
.
string
());
tessedit_char_whitelist
.
string
(),
tessedit_char_unblacklist
.
string
());
// Black and white lists should apply to all loaded classifiers.
for
(
int
i
=
0
;
i
<
sub_langs_
.
size
();
++
i
)
{
sub_langs_
[
i
]
->
unicharset
.
set_black_and_whitelist
(
tessedit_char_blacklist
.
string
(),
tessedit_char_whitelist
.
string
());
tessedit_char_blacklist
.
string
(),
tessedit_char_whitelist
.
string
(),
tessedit_char_unblacklist
.
string
());
}
}
...
...
ccmain/tesseractclass.h
浏览文件 @
f9277281
///////////////////////////////////////////////////////////////////////
// File: tesseractclass.h
// Description: An instance of Tesseract. For thread safety, *every*
// Description: The Tesseract class. It holds/owns everything needed
// to run Tesseract on a single language, and also a set of
// sub-Tesseracts to run sub-languages. For thread safety, *every*
// global variable goes in here, directly, or indirectly.
// This makes it safe to run multiple Tesseracts in different
// threads in parallel, and keeps the different language
// instances separate.
// Author: Ray Smith
// Created: Fri Mar 07 08:17:01 PST 2008
//
...
...
@@ -743,6 +748,8 @@ class Tesseract : public Wordrec {
"Blacklist of chars not to recognize"
);
STRING_VAR_H
(
tessedit_char_whitelist
,
""
,
"Whitelist of chars to recognize"
);
STRING_VAR_H
(
tessedit_char_unblacklist
,
""
,
"List of chars to override tessedit_char_blacklist"
);
BOOL_VAR_H
(
tessedit_ambigs_training
,
false
,
"Perform training for ambiguities"
);
INT_VAR_H
(
pageseg_devanagari_split_strategy
,
...
...
ccutil/unicharset.cpp
浏览文件 @
f9277281
...
...
@@ -985,8 +985,10 @@ bool UNICHARSET::major_right_to_left() const {
// Set a whitelist and/or blacklist of characters to recognize.
// An empty or NULL whitelist enables everything (minus any blacklist).
// An empty or NULL blacklist disables nothing.
// An empty or NULL blacklist has no effect.
void
UNICHARSET
::
set_black_and_whitelist
(
const
char
*
blacklist
,
const
char
*
whitelist
)
{
const
char
*
whitelist
,
const
char
*
unblacklist
)
{
bool
def_enabled
=
whitelist
==
NULL
||
whitelist
[
0
]
==
'\0'
;
// Set everything to default
for
(
int
ch
=
0
;
ch
<
size_used
;
++
ch
)
...
...
@@ -1009,6 +1011,15 @@ void UNICHARSET::set_black_and_whitelist(const char* blacklist,
unichars
[
encoding
[
i
]].
properties
.
enabled
=
false
;
}
}
if
(
unblacklist
!=
NULL
&&
unblacklist
[
0
]
!=
'\0'
)
{
// Re-enable the unblacklist.
GenericVector
<
UNICHAR_ID
>
encoding
;
encode_string
(
unblacklist
,
false
,
&
encoding
,
NULL
,
NULL
);
for
(
int
i
=
0
;
i
<
encoding
.
size
();
++
i
)
{
if
(
encoding
[
i
]
!=
INVALID_UNICHAR_ID
)
unichars
[
encoding
[
i
]].
properties
.
enabled
=
true
;
}
}
}
int
UNICHARSET
::
add_script
(
const
char
*
script
)
{
...
...
ccutil/unicharset.h
浏览文件 @
f9277281
...
...
@@ -381,11 +381,14 @@ class UNICHARSET {
// Set a whitelist and/or blacklist of characters to recognize.
// An empty or NULL whitelist enables everything (minus any blacklist).
// An empty or NULL blacklist disables nothing.
// An empty or NULL unblacklist has no effect.
// The blacklist overrides the whitelist.
// The unblacklist overrides the blacklist.
// Each list is a string of utf8 character strings. Boundaries between
// unicharset units are worked out automatically, and characters not in
// the unicharset are silently ignored.
void
set_black_and_whitelist
(
const
char
*
blacklist
,
const
char
*
whitelist
);
void
set_black_and_whitelist
(
const
char
*
blacklist
,
const
char
*
whitelist
,
const
char
*
unblacklist
);
// Set the isalpha property of the given unichar to the given value.
void
set_isalpha
(
UNICHAR_ID
unichar_id
,
bool
value
)
{
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录