Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
2dot5
ClickHouse
提交
b7e53208
C
ClickHouse
项目概览
2dot5
/
ClickHouse
通知
3
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
C
ClickHouse
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
b7e53208
编写于
9月 04, 2020
作者:
N
Nikolai Kochetov
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Fix tests.
上级
92c937db
变更
13
隐藏空白更改
内联
并排
Showing
13 changed file
with
136 addition
and
91 deletion
+136
-91
src/Interpreters/ActionsVisitor.cpp
src/Interpreters/ActionsVisitor.cpp
+2
-30
src/Interpreters/DatabaseCatalog.cpp
src/Interpreters/DatabaseCatalog.cpp
+9
-2
src/Interpreters/DatabaseCatalog.h
src/Interpreters/DatabaseCatalog.h
+2
-1
src/Interpreters/ExpressionAnalyzer.cpp
src/Interpreters/ExpressionAnalyzer.cpp
+1
-1
src/Interpreters/GlobalSubqueriesVisitor.h
src/Interpreters/GlobalSubqueriesVisitor.h
+4
-2
src/Interpreters/InterpreterSelectQuery.cpp
src/Interpreters/InterpreterSelectQuery.cpp
+1
-1
src/Processors/IAccumulatingTransform.cpp
src/Processors/IAccumulatingTransform.cpp
+23
-0
src/Processors/IAccumulatingTransform.h
src/Processors/IAccumulatingTransform.h
+2
-0
src/Processors/QueryPipeline.cpp
src/Processors/QueryPipeline.cpp
+6
-15
src/Processors/Transforms/CreatingSetsTransform.cpp
src/Processors/Transforms/CreatingSetsTransform.cpp
+6
-30
src/Processors/Transforms/CreatingSetsTransform.h
src/Processors/Transforms/CreatingSetsTransform.h
+2
-8
src/Storages/StorageMemory.cpp
src/Storages/StorageMemory.cpp
+39
-1
src/Storages/StorageMemory.h
src/Storages/StorageMemory.h
+39
-0
未找到文件。
src/Interpreters/ActionsVisitor.cpp
浏览文件 @
b7e53208
...
...
@@ -887,38 +887,10 @@ SetPtr ActionsMatcher::makeSet(const ASTFunction & node, Data & data, bool no_su
* in the subquery_for_set object, this subquery is set as source and the temporary table _data1 as the table.
* - this function shows the expression IN_data1.
*/
if
(
!
subquery_for_set
.
source
&&
data
.
no_storage_or_local
)
if
(
subquery_for_set
.
source
.
empty
()
&&
data
.
no_storage_or_local
)
{
auto
interpreter
=
interpretSubquery
(
right_in_operand
,
data
.
context
,
data
.
subquery_depth
,
{});
subquery_for_set
.
source
=
std
::
make_shared
<
LazyBlockInputStream
>
(
interpreter
->
getSampleBlock
(),
[
interpreter
]()
mutable
{
return
interpreter
->
execute
().
getInputStream
();
});
/** Why is LazyBlockInputStream used?
*
* The fact is that when processing a query of the form
* SELECT ... FROM remote_test WHERE column GLOBAL IN (subquery),
* if the distributed remote_test table contains localhost as one of the servers,
* the query will be interpreted locally again (and not sent over TCP, as in the case of a remote server).
*
* The query execution pipeline will be:
* CreatingSets
* subquery execution, filling the temporary table with _data1 (1)
* CreatingSets
* reading from the table _data1, creating the set (2)
* read from the table subordinate to remote_test.
*
* (The second part of the pipeline under CreateSets is a reinterpretation of the query inside StorageDistributed,
* the query differs in that the database name and tables are replaced with subordinates, and the subquery is replaced with _data1.)
*
* But when creating the pipeline, when creating the source (2), it will be found that the _data1 table is empty
* (because the query has not started yet), and empty source will be returned as the source.
* And then, when the query is executed, an empty set will be created in step (2).
*
* Therefore, we make the initialization of step (2) lazy
* - so that it does not occur until step (1) is completed, on which the table will be populated.
*
* Note: this solution is not very good, you need to think better.
*/
subquery_for_set
.
source
=
QueryPipeline
::
getPipe
(
interpreter
->
execute
().
pipeline
);
}
subquery_for_set
.
set
=
set
;
...
...
src/Interpreters/DatabaseCatalog.cpp
浏览文件 @
b7e53208
...
...
@@ -64,13 +64,20 @@ TemporaryTableHolder::TemporaryTableHolder(
const
Context
&
context_
,
const
ColumnsDescription
&
columns
,
const
ConstraintsDescription
&
constraints
,
const
ASTPtr
&
query
)
const
ASTPtr
&
query
,
bool
create_for_global_subquery
)
:
TemporaryTableHolder
(
context_
,
[
&
](
const
StorageID
&
table_id
)
{
return
StorageMemory
::
create
(
table_id
,
ColumnsDescription
{
columns
},
ConstraintsDescription
{
constraints
});
auto
storage
=
StorageMemory
::
create
(
table_id
,
ColumnsDescription
{
columns
},
ConstraintsDescription
{
constraints
});
if
(
create_for_global_subquery
)
storage
->
delayReadForGlobalSubqueries
();
return
storage
;
},
query
)
...
...
src/Interpreters/DatabaseCatalog.h
浏览文件 @
b7e53208
...
...
@@ -78,7 +78,8 @@ struct TemporaryTableHolder : boost::noncopyable
const
Context
&
context
,
const
ColumnsDescription
&
columns
,
const
ConstraintsDescription
&
constraints
,
const
ASTPtr
&
query
=
{});
const
ASTPtr
&
query
=
{},
bool
create_for_global_subquery
=
false
);
TemporaryTableHolder
(
TemporaryTableHolder
&&
rhs
);
TemporaryTableHolder
&
operator
=
(
TemporaryTableHolder
&&
rhs
);
...
...
src/Interpreters/ExpressionAnalyzer.cpp
浏览文件 @
b7e53208
...
...
@@ -583,7 +583,7 @@ JoinPtr SelectQueryExpressionAnalyzer::makeTableJoin(const ASTTablesInSelectQuer
ExpressionActionsPtr
joined_block_actions
=
createJoinedBlockActions
(
context
,
analyzedJoin
());
Names
original_right_columns
;
if
(
!
subquery_for_join
.
source
)
if
(
subquery_for_join
.
source
.
empty
()
)
{
NamesWithAliases
required_columns_with_aliases
=
analyzedJoin
().
getRequiredColumns
(
joined_block_actions
->
getSampleBlock
(),
joined_block_actions
->
getRequiredColumns
());
...
...
src/Interpreters/GlobalSubqueriesVisitor.h
浏览文件 @
b7e53208
...
...
@@ -103,7 +103,9 @@ public:
Block
sample
=
interpreter
->
getSampleBlock
();
NamesAndTypesList
columns
=
sample
.
getNamesAndTypesList
();
auto
external_storage_holder
=
std
::
make_shared
<
TemporaryTableHolder
>
(
context
,
ColumnsDescription
{
columns
},
ConstraintsDescription
{});
auto
external_storage_holder
=
std
::
make_shared
<
TemporaryTableHolder
>
(
context
,
ColumnsDescription
{
columns
},
ConstraintsDescription
{},
nullptr
,
/*create_for_global_subquery*/
true
);
StoragePtr
external_storage
=
external_storage_holder
->
getTable
();
/** We replace the subquery with the name of the temporary table.
...
...
@@ -134,7 +136,7 @@ public:
ast
=
database_and_table_name
;
external_tables
[
external_table_name
]
=
external_storage_holder
;
subqueries_for_sets
[
external_table_name
].
source
=
interpreter
->
execute
().
getInputStream
(
);
subqueries_for_sets
[
external_table_name
].
source
=
QueryPipeline
::
getPipe
(
interpreter
->
execute
().
pipeline
);
subqueries_for_sets
[
external_table_name
].
table
=
external_storage
;
/** NOTE If it was written IN tmp_table - the existing temporary (but not external) table,
...
...
src/Interpreters/InterpreterSelectQuery.cpp
浏览文件 @
b7e53208
...
...
@@ -1833,7 +1833,7 @@ void InterpreterSelectQuery::executeSubqueriesInSetsAndJoins(QueryPlan & query_p
auto
creating_sets
=
std
::
make_unique
<
CreatingSetsStep
>
(
query_plan
.
getCurrentDataStream
(),
s
ubqueries_for_sets
,
s
td
::
move
(
subqueries_for_sets
)
,
SizeLimits
(
settings
.
max_rows_to_transfer
,
settings
.
max_bytes_to_transfer
,
settings
.
transfer_overflow_mode
),
*
context
);
...
...
src/Processors/IAccumulatingTransform.cpp
浏览文件 @
b7e53208
...
...
@@ -14,6 +14,14 @@ IAccumulatingTransform::IAccumulatingTransform(Block input_header, Block output_
{
}
InputPort
*
IAccumulatingTransform
::
addTotalsPort
()
{
if
(
inputs
.
size
()
>
1
)
throw
Exception
(
"Totals port was already added to IAccumulatingTransform"
,
ErrorCodes
::
LOGICAL_ERROR
);
return
&
inputs
.
emplace_back
(
getInputPort
().
getHeader
(),
this
);
}
IAccumulatingTransform
::
Status
IAccumulatingTransform
::
prepare
()
{
/// Check can output.
...
...
@@ -42,6 +50,21 @@ IAccumulatingTransform::Status IAccumulatingTransform::prepare()
/// Generate output block.
if
(
input
.
isFinished
())
{
/// Read from totals port if has it.
if
(
inputs
.
size
()
>
1
)
{
auto
&
totals_input
=
inputs
.
back
();
if
(
!
totals_input
.
isFinished
())
{
totals_input
.
setNeeded
();
if
(
!
totals_input
.
hasData
())
return
Status
::
NeedData
;
totals
=
totals_input
.
pull
();
totals_input
.
close
();
}
}
finished_input
=
true
;
return
Status
::
Ready
;
}
...
...
src/Processors/IAccumulatingTransform.h
浏览文件 @
b7e53208
...
...
@@ -18,6 +18,7 @@ protected:
Chunk
current_input_chunk
;
Chunk
current_output_chunk
;
Chunk
totals
;
bool
has_input
=
false
;
bool
finished_input
=
false
;
bool
finished_generate
=
false
;
...
...
@@ -34,6 +35,7 @@ public:
Status
prepare
()
override
;
void
work
()
override
;
InputPort
*
addTotalsPort
();
InputPort
&
getInputPort
()
{
return
input
;
}
OutputPort
&
getOutputPort
()
{
return
output
;
}
...
...
src/Processors/QueryPipeline.cpp
浏览文件 @
b7e53208
...
...
@@ -240,16 +240,13 @@ void QueryPipeline::addCreatingSetsTransform(SubqueriesForSets subqueries_for_se
source
.
collected_processors
=
nullptr
;
resize
(
1
);
pipe
=
Pipe
::
unitePipes
({
std
::
move
(
pipe
),
std
::
move
(
source
)},
collected_processors
);
/// Order is important for concat. Connect manually.
pipe
.
transform
([
&
](
OutputPortRawPtrs
ports
)
->
Processors
{
auto
concat
=
std
::
make_shared
<
ConcatProcessor
>
(
getHeader
(),
2
);
connect
(
*
ports
.
front
(),
concat
->
getInputs
().
front
());
connect
(
*
ports
.
back
(),
concat
->
getInputs
().
back
());
return
{
std
::
move
(
concat
)
};
});
Pipes
pipes
;
pipes
.
emplace_back
(
std
::
move
(
source
));
pipes
.
emplace_back
(
std
::
move
(
pipe
));
pipe
=
Pipe
::
unitePipes
(
std
::
move
(
pipes
),
collected_processors
);
pipe
.
addTransform
(
std
::
make_shared
<
ConcatProcessor
>
(
getHeader
(),
2
));
}
void
QueryPipeline
::
setOutputFormat
(
ProcessorPtr
output
)
...
...
@@ -324,9 +321,6 @@ void QueryPipeline::setProgressCallback(const ProgressCallback & callback)
{
if
(
auto
*
source
=
dynamic_cast
<
ISourceWithProgress
*>
(
processor
.
get
()))
source
->
setProgressCallback
(
callback
);
if
(
auto
*
source
=
typeid_cast
<
CreatingSetsTransform
*>
(
processor
.
get
()))
source
->
setProgressCallback
(
callback
);
}
}
...
...
@@ -338,9 +332,6 @@ void QueryPipeline::setProcessListElement(QueryStatus * elem)
{
if
(
auto
*
source
=
dynamic_cast
<
ISourceWithProgress
*>
(
processor
.
get
()))
source
->
setProcessListElement
(
elem
);
if
(
auto
*
source
=
typeid_cast
<
CreatingSetsTransform
*>
(
processor
.
get
()))
source
->
setProcessListElement
(
elem
);
}
}
...
...
src/Processors/Transforms/CreatingSetsTransform.cpp
浏览文件 @
b7e53208
...
...
@@ -33,35 +33,6 @@ CreatingSetsTransform::CreatingSetsTransform(
{
}
InputPort
*
CreatingSetsTransform
::
addTotalsPort
()
{
if
(
inputs
.
size
()
>
1
)
throw
Exception
(
"Totals port was already added to CreatingSetsTransform"
,
ErrorCodes
::
LOGICAL_ERROR
);
return
&
inputs
.
emplace_back
(
getInputPort
().
getHeader
(),
this
);
}
IProcessor
::
Status
CreatingSetsTransform
::
prepare
()
{
auto
status
=
IAccumulatingTransform
::
prepare
();
if
(
status
==
IProcessor
::
Status
::
Finished
&&
inputs
.
size
()
>
1
)
{
auto
&
totals_input
=
inputs
.
back
();
if
(
totals_input
.
isFinished
())
return
IProcessor
::
Status
::
Finished
;
totals_input
.
setNeeded
();
if
(
!
totals_input
.
hasData
())
return
IProcessor
::
Status
::
NeedData
;
auto
totals
=
totals_input
.
pull
();
subquery
.
setTotals
(
getInputPort
().
getHeader
().
cloneWithColumns
(
totals
.
detachColumns
()));
totals_input
.
close
();
}
return
status
;
}
void
CreatingSetsTransform
::
work
()
{
if
(
!
is_initialized
)
...
...
@@ -110,6 +81,12 @@ void CreatingSetsTransform::finishSubquery()
{
LOG_DEBUG
(
log
,
"Subquery has empty result."
);
}
if
(
totals
)
subquery
.
setTotals
(
getInputPort
().
getHeader
().
cloneWithColumns
(
totals
.
detachColumns
()));
else
/// Set empty totals anyway, it is needed for MergeJoin.
subquery
.
setTotals
({});
}
void
CreatingSetsTransform
::
init
()
...
...
@@ -166,7 +143,6 @@ Chunk CreatingSetsTransform::generate()
table_out
->
writeSuffix
();
finishSubquery
();
finished
=
true
;
return
{};
}
...
...
src/Processors/Transforms/CreatingSetsTransform.h
浏览文件 @
b7e53208
...
...
@@ -12,10 +12,10 @@ class QueryStatus;
struct
Progress
;
using
ProgressCallback
=
std
::
function
<
void
(
const
Progress
&
progress
)
>
;
/// This processor creates set
s
during execution.
/// This processor creates set during execution.
/// Don't return any data. Sets are created when Finish status is returned.
/// In general, several work() methods need to be called to finish.
///
TODO: several independent processors can be created for each subquery. Make subquery a piece of pipeline
.
///
Independent processors is created for each subquery
.
class
CreatingSetsTransform
:
public
IAccumulatingTransform
{
public:
...
...
@@ -28,16 +28,10 @@ public:
String
getName
()
const
override
{
return
"CreatingSetsTransform"
;
}
Status
prepare
()
override
;
void
work
()
override
;
void
consume
(
Chunk
chunk
)
override
;
Chunk
generate
()
override
;
InputPort
*
addTotalsPort
();
protected:
bool
finished
=
false
;
private:
SubqueryForSet
subquery
;
...
...
src/Storages/StorageMemory.cpp
浏览文件 @
b7e53208
...
...
@@ -38,11 +38,31 @@ public:
{
}
/// If called, will initialize the number of blocks at first read.
/// It allows to read data which was inserted into memory table AFTER Storage::read was called.
/// This hack is needed for global subqueries.
void
delayInitialization
(
BlocksList
*
data_
,
std
::
mutex
*
mutex_
)
{
data
=
data_
;
mutex
=
mutex_
;
}
String
getName
()
const
override
{
return
"Memory"
;
}
protected:
Chunk
generate
()
override
{
if
(
data
)
{
std
::
lock_guard
guard
(
*
mutex
);
current_it
=
data
->
begin
();
num_blocks
=
data
->
size
();
is_finished
=
num_blocks
==
0
;
data
=
nullptr
;
mutex
=
nullptr
;
}
if
(
is_finished
)
{
return
{};
...
...
@@ -71,8 +91,11 @@ private:
Names
column_names
;
BlocksList
::
iterator
current_it
;
size_t
current_block_idx
=
0
;
const
size_t
num_blocks
;
size_t
num_blocks
;
bool
is_finished
=
false
;
BlocksList
*
data
=
nullptr
;
std
::
mutex
*
mutex
=
nullptr
;
};
...
...
@@ -123,6 +146,21 @@ Pipe StorageMemory::read(
std
::
lock_guard
lock
(
mutex
);
if
(
delay_read_for_global_subqueries
)
{
/// Note: for global subquery we use single source.
/// Mainly, the reason is that at this point table is empty,
/// and we don't know the number of blocks are going to be inserted into it.
///
/// It may seem to be not optimal, but actually data from such table is used to fill
/// set for IN or hash table for JOIN, which can't be done concurrently.
/// Since no other manipulation with data is done, multiple sources shouldn't give any profit.
auto
source
=
std
::
make_shared
<
MemorySource
>
(
column_names
,
data
.
begin
(),
data
.
size
(),
*
this
,
metadata_snapshot
);
source
->
delayInitialization
(
&
data
,
&
mutex
);
return
Pipe
(
std
::
move
(
source
));
}
size_t
size
=
data
.
size
();
if
(
num_streams
>
size
)
...
...
src/Storages/StorageMemory.h
浏览文件 @
b7e53208
...
...
@@ -48,12 +48,51 @@ public:
std
::
optional
<
UInt64
>
totalRows
()
const
override
;
std
::
optional
<
UInt64
>
totalBytes
()
const
override
;
/** Delays initialization of StorageMemory::read() until the first read is actually happen.
* Usually, fore code like this:
*
* auto out = StorageMemory::write();
* auto in = StorageMemory::read();
* out->write(new_data);
*
* `new_data` won't appear into `in`.
* However, if delayReadForGlobalSubqueries is called, first read from `in` will check for new_data and return it.
*
*
* Why is delayReadForGlobalSubqueries needed?
*
* The fact is that when processing a query of the form
* SELECT ... FROM remote_test WHERE column GLOBAL IN (subquery),
* if the distributed remote_test table contains localhost as one of the servers,
* the query will be interpreted locally again (and not sent over TCP, as in the case of a remote server).
*
* The query execution pipeline will be:
* CreatingSets
* subquery execution, filling the temporary table with _data1 (1)
* CreatingSets
* reading from the table _data1, creating the set (2)
* read from the table subordinate to remote_test.
*
* (The second part of the pipeline under CreateSets is a reinterpretation of the query inside StorageDistributed,
* the query differs in that the database name and tables are replaced with subordinates, and the subquery is replaced with _data1.)
*
* But when creating the pipeline, when creating the source (2), it will be found that the _data1 table is empty
* (because the query has not started yet), and empty source will be returned as the source.
* And then, when the query is executed, an empty set will be created in step (2).
*
* Therefore, we make the initialization of step (2) delayed
* - so that it does not occur until step (1) is completed, on which the table will be populated.
*/
void
delayReadForGlobalSubqueries
()
{
delay_read_for_global_subqueries
=
true
;
}
private:
/// The data itself. `list` - so that when inserted to the end, the existing iterators are not invalidated.
BlocksList
data
;
mutable
std
::
mutex
mutex
;
bool
delay_read_for_global_subqueries
=
false
;
protected:
StorageMemory
(
const
StorageID
&
table_id_
,
ColumnsDescription
columns_description_
,
ConstraintsDescription
constraints_
);
};
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录