Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
2dot5
ClickHouse
提交
324dcf00
C
ClickHouse
项目概览
2dot5
/
ClickHouse
通知
3
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
C
ClickHouse
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
324dcf00
编写于
5月 05, 2019
作者:
A
alexey-milovidov
提交者:
GitHub
5月 05, 2019
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #5191 from yandex/regexp_extraction_fix
Regexp extraction fix for small prefixes
上级
173884c0
e531348e
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
20 addition
and
6 deletion
+20
-6
dbms/src/Common/OptimizedRegularExpression.cpp
dbms/src/Common/OptimizedRegularExpression.cpp
+20
-6
未找到文件。
dbms/src/Common/OptimizedRegularExpression.cpp
浏览文件 @
324dcf00
#include <Common/Exception.h>
#include <Common/OptimizedRegularExpression.h>
#define MIN_LENGTH_FOR_STRSTR 3
#define MAX_SUBPATTERNS 5
...
...
@@ -214,23 +213,38 @@ void OptimizedRegularExpressionImpl<thread_safe>::analyze(
/** We choose the non-alternative substring of the maximum length, among the prefixes,
* or a non-alternative substring of maximum length.
*/
/// Tuning for typical usage domain
auto
tuning_strings_condition
=
[](
const
std
::
string
&
str
)
{
return
str
!=
"://"
&&
str
!=
"http://"
&&
str
!=
"www"
&&
str
!=
"Windows "
;
};
size_t
max_length
=
0
;
Substrings
::
const_iterator
candidate_it
=
trivial_substrings
.
begin
();
for
(
Substrings
::
const_iterator
it
=
trivial_substrings
.
begin
();
it
!=
trivial_substrings
.
end
();
++
it
)
{
if
(((
it
->
second
==
0
&&
candidate_it
->
second
!=
0
)
||
((
it
->
second
==
0
)
==
(
candidate_it
->
second
==
0
)
&&
it
->
first
.
size
()
>
max_length
))
/// Tuning for typical usage domain
&&
(
it
->
first
.
size
()
>
strlen
(
"://"
)
||
strncmp
(
it
->
first
.
data
(),
"://"
,
strlen
(
"://"
)))
&&
(
it
->
first
.
size
()
>
strlen
(
"http://"
)
||
strncmp
(
it
->
first
.
data
(),
"http"
,
strlen
(
"http"
)))
&&
(
it
->
first
.
size
()
>
strlen
(
"www."
)
||
strncmp
(
it
->
first
.
data
(),
"www"
,
strlen
(
"www"
)))
&&
(
it
->
first
.
size
()
>
strlen
(
"Windows "
)
||
strncmp
(
it
->
first
.
data
(),
"Windows "
,
strlen
(
"Windows "
))))
&&
tuning_strings_condition
(
it
->
first
))
{
max_length
=
it
->
first
.
size
();
candidate_it
=
it
;
}
}
/// If prefix is small, it won't be chosen
if
(
max_length
<
MIN_LENGTH_FOR_STRSTR
)
{
for
(
Substrings
::
const_iterator
it
=
trivial_substrings
.
begin
();
it
!=
trivial_substrings
.
end
();
++
it
)
{
if
(
it
->
first
.
size
()
>
max_length
&&
tuning_strings_condition
(
it
->
first
))
{
max_length
=
it
->
first
.
size
();
candidate_it
=
it
;
}
}
}
if
(
max_length
>=
MIN_LENGTH_FOR_STRSTR
)
{
required_substring
=
candidate_it
->
first
;
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录