Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
李少辉-开发者
gitlab-foss
提交
474fd913
G
gitlab-foss
项目概览
李少辉-开发者
/
gitlab-foss
通知
15
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
G
gitlab-foss
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
474fd913
编写于
11月 23, 2018
作者:
A
Andreas Brandl
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Move strategies in their own files
This improves readability quite a bit.
上级
ff35cb45
变更
8
隐藏空白更改
内联
并排
Showing
8 changed file
with
323 addition
and
279 deletion
+323
-279
lib/gitlab/database/count.rb
lib/gitlab/database/count.rb
+0
-155
lib/gitlab/database/count/exact_count_strategy.rb
lib/gitlab/database/count/exact_count_strategy.rb
+31
-0
lib/gitlab/database/count/reltuples_count_strategy.rb
lib/gitlab/database/count/reltuples_count_strategy.rb
+79
-0
lib/gitlab/database/count/tablesample_count_strategy.rb
lib/gitlab/database/count/tablesample_count_strategy.rb
+66
-0
spec/lib/gitlab/database/count/exact_count_strategy_spec.rb
spec/lib/gitlab/database/count/exact_count_strategy_spec.rb
+34
-0
spec/lib/gitlab/database/count/reltuples_count_strategy_spec.rb
...ib/gitlab/database/count/reltuples_count_strategy_spec.rb
+48
-0
spec/lib/gitlab/database/count/tablesample_count_strategy_spec.rb
.../gitlab/database/count/tablesample_count_strategy_spec.rb
+65
-0
spec/lib/gitlab/database/count_spec.rb
spec/lib/gitlab/database/count_spec.rb
+0
-124
未找到文件。
lib/gitlab/database/count.rb
浏览文件 @
474fd913
...
...
@@ -50,161 +50,6 @@ module Gitlab
end
end
end
# This strategy performs an exact count on the model.
#
# This is guaranteed to be accurate, however it also scans the
# whole table. Hence, there are no guarantees with respect
# to runtime.
#
# Note that for very large tables, this may even timeout.
class
ExactCountStrategy
attr_reader
:models
def
initialize
(
models
)
@models
=
models
end
def
count
models
.
each_with_object
({})
do
|
model
,
data
|
data
[
model
]
=
model
.
count
end
end
def
self
.
enabled?
true
end
end
class
PgClass
<
ActiveRecord
::
Base
self
.
table_name
=
'pg_class'
end
# This strategy counts based on PostgreSQL's statistics in pg_stat_user_tables.
#
# Specifically, it relies on the column reltuples in said table. An additional
# check is performed to make sure statistics were updated within the last hour.
#
# Otherwise, this strategy skips tables with outdated statistics.
#
# There are no guarantees with respect to the accuracy of this strategy. Runtime
# however is guaranteed to be "fast", because it only looks up statistics.
class
ReltuplesCountStrategy
attr_reader
:models
def
initialize
(
models
)
@models
=
models
end
# Returns a hash of the table names that have recently updated tuples.
#
# @returns [Hash] Table name to count mapping (e.g. { 'projects' => 5, 'users' => 100 })
def
count
size_estimates
rescue
*
CONNECTION_ERRORS
{}
end
def
self
.
enabled?
Gitlab
::
Database
.
postgresql?
end
private
def
table_names
models
.
map
(
&
:table_name
)
end
def
size_estimates
(
check_statistics:
true
)
table_to_model
=
models
.
each_with_object
({})
{
|
model
,
h
|
h
[
model
.
table_name
]
=
model
}
# Querying tuple stats only works on the primary. Due to load balancing, the
# easiest way to do this is to start a transaction.
ActiveRecord
::
Base
.
transaction
do
get_statistics
(
table_names
,
check_statistics:
check_statistics
).
each_with_object
({})
do
|
row
,
data
|
model
=
table_to_model
[
row
.
table_name
]
data
[
model
]
=
row
.
estimate
end
end
end
# Generates the PostgreSQL query to return the tuples for tables
# that have been vacuumed or analyzed in the last hour.
#
# @param [Array] table names
# @returns [Hash] Table name to count mapping (e.g. { 'projects' => 5, 'users' => 100 })
def
get_statistics
(
table_names
,
check_statistics:
true
)
time
=
"to_timestamp(
#{
1
.
hour
.
ago
.
to_i
}
)"
query
=
PgClass
.
joins
(
"LEFT JOIN pg_stat_user_tables USING (relname)"
)
.
where
(
relname:
table_names
)
.
select
(
'pg_class.relname AS table_name, reltuples::bigint AS estimate'
)
if
check_statistics
query
=
query
.
where
(
'last_vacuum > ? OR last_autovacuum > ? OR last_analyze > ? OR last_autoanalyze > ?'
,
time
,
time
,
time
,
time
)
end
query
end
end
# A tablesample count executes in two phases:
# * Estimate table sizes based on reltuples.
# * Based on the estimate:
# * If the table is considered 'small', execute an exact relation count.
# * Otherwise, count on a sample of the table using TABLESAMPLE.
#
# The size of the sample is chosen in a way that we always roughly scan
# the same amount of rows (see TABLESAMPLE_ROW_TARGET).
#
# There are no guarantees with respect to the accuracy of the result or runtime.
class
TablesampleCountStrategy
<
ReltuplesCountStrategy
EXACT_COUNT_THRESHOLD
=
100_000
TABLESAMPLE_ROW_TARGET
=
100_000
def
count
estimates
=
size_estimates
(
check_statistics:
false
)
models
.
each_with_object
({})
do
|
model
,
count_by_model
|
count
=
perform_count
(
model
,
estimates
[
model
])
count_by_model
[
model
]
=
count
if
count
end
rescue
*
CONNECTION_ERRORS
{}
end
def
self
.
enabled?
Gitlab
::
Database
.
postgresql?
&&
Feature
.
enabled?
(
:tablesample_counts
)
end
private
def
perform_count
(
model
,
estimate
)
# If we estimate 0, we may not have statistics at all. Don't use them.
return
nil
unless
estimate
&&
estimate
>
0
if
estimate
<
EXACT_COUNT_THRESHOLD
# The table is considered small, the assumption here is that
# the exact count will be fast anyways.
model
.
count
else
# The table is considered large, let's only count on a sample.
tablesample_count
(
model
,
estimate
)
end
end
def
tablesample_count
(
model
,
estimate
)
portion
=
(
TABLESAMPLE_ROW_TARGET
.
to_f
/
estimate
).
round
(
4
)
inverse
=
1
/
portion
query
=
<<~
SQL
SELECT (COUNT(*)*
#{
inverse
}
)::integer AS count
FROM
#{
model
.
table_name
}
TABLESAMPLE SYSTEM (
#{
portion
*
100
}
)
SQL
rows
=
ActiveRecord
::
Base
.
connection
.
select_all
(
query
)
Integer
(
rows
.
first
[
'count'
])
end
end
end
end
end
lib/gitlab/database/count/exact_count_strategy.rb
0 → 100644
浏览文件 @
474fd913
# frozen_string_literal: true
module
Gitlab
module
Database
module
Count
# This strategy performs an exact count on the model.
#
# This is guaranteed to be accurate, however it also scans the
# whole table. Hence, there are no guarantees with respect
# to runtime.
#
# Note that for very large tables, this may even timeout.
class
ExactCountStrategy
attr_reader
:models
def
initialize
(
models
)
@models
=
models
end
def
count
models
.
each_with_object
({})
do
|
model
,
data
|
data
[
model
]
=
model
.
count
end
end
def
self
.
enabled?
true
end
end
end
end
end
lib/gitlab/database/count/reltuples_count_strategy.rb
0 → 100644
浏览文件 @
474fd913
# frozen_string_literal: true
module
Gitlab
module
Database
module
Count
class
PgClass
<
ActiveRecord
::
Base
self
.
table_name
=
'pg_class'
end
# This strategy counts based on PostgreSQL's statistics in pg_stat_user_tables.
#
# Specifically, it relies on the column reltuples in said table. An additional
# check is performed to make sure statistics were updated within the last hour.
#
# Otherwise, this strategy skips tables with outdated statistics.
#
# There are no guarantees with respect to the accuracy of this strategy. Runtime
# however is guaranteed to be "fast", because it only looks up statistics.
class
ReltuplesCountStrategy
attr_reader
:models
def
initialize
(
models
)
@models
=
models
end
# Returns a hash of the table names that have recently updated tuples.
#
# @returns [Hash] Table name to count mapping (e.g. { 'projects' => 5, 'users' => 100 })
def
count
size_estimates
rescue
*
CONNECTION_ERRORS
{}
end
def
self
.
enabled?
Gitlab
::
Database
.
postgresql?
end
private
def
table_names
models
.
map
(
&
:table_name
)
end
def
size_estimates
(
check_statistics:
true
)
table_to_model
=
models
.
each_with_object
({})
{
|
model
,
h
|
h
[
model
.
table_name
]
=
model
}
# Querying tuple stats only works on the primary. Due to load balancing, the
# easiest way to do this is to start a transaction.
ActiveRecord
::
Base
.
transaction
do
get_statistics
(
table_names
,
check_statistics:
check_statistics
).
each_with_object
({})
do
|
row
,
data
|
model
=
table_to_model
[
row
.
table_name
]
data
[
model
]
=
row
.
estimate
end
end
end
# Generates the PostgreSQL query to return the tuples for tables
# that have been vacuumed or analyzed in the last hour.
#
# @param [Array] table names
# @returns [Hash] Table name to count mapping (e.g. { 'projects' => 5, 'users' => 100 })
def
get_statistics
(
table_names
,
check_statistics:
true
)
time
=
1
.
hour
.
ago
query
=
PgClass
.
joins
(
"LEFT JOIN pg_stat_user_tables USING (relname)"
)
.
where
(
relname:
table_names
)
.
select
(
'pg_class.relname AS table_name, reltuples::bigint AS estimate'
)
if
check_statistics
query
=
query
.
where
(
'last_vacuum > ? OR last_autovacuum > ? OR last_analyze > ? OR last_autoanalyze > ?'
,
time
,
time
,
time
,
time
)
end
query
end
end
end
end
end
lib/gitlab/database/count/tablesample_count_strategy.rb
0 → 100644
浏览文件 @
474fd913
# frozen_string_literal: true
module
Gitlab
module
Database
module
Count
# A tablesample count executes in two phases:
# * Estimate table sizes based on reltuples.
# * Based on the estimate:
# * If the table is considered 'small', execute an exact relation count.
# * Otherwise, count on a sample of the table using TABLESAMPLE.
#
# The size of the sample is chosen in a way that we always roughly scan
# the same amount of rows (see TABLESAMPLE_ROW_TARGET).
#
# There are no guarantees with respect to the accuracy of the result or runtime.
class
TablesampleCountStrategy
<
ReltuplesCountStrategy
EXACT_COUNT_THRESHOLD
=
100_000
TABLESAMPLE_ROW_TARGET
=
100_000
def
count
estimates
=
size_estimates
(
check_statistics:
false
)
models
.
each_with_object
({})
do
|
model
,
count_by_model
|
count
=
perform_count
(
model
,
estimates
[
model
])
count_by_model
[
model
]
=
count
if
count
end
rescue
*
CONNECTION_ERRORS
{}
end
def
self
.
enabled?
Gitlab
::
Database
.
postgresql?
&&
Feature
.
enabled?
(
:tablesample_counts
)
end
private
def
perform_count
(
model
,
estimate
)
# If we estimate 0, we may not have statistics at all. Don't use them.
return
nil
unless
estimate
&&
estimate
>
0
if
estimate
<
EXACT_COUNT_THRESHOLD
# The table is considered small, the assumption here is that
# the exact count will be fast anyways.
model
.
count
else
# The table is considered large, let's only count on a sample.
tablesample_count
(
model
,
estimate
)
end
end
def
tablesample_count
(
model
,
estimate
)
portion
=
(
TABLESAMPLE_ROW_TARGET
.
to_f
/
estimate
).
round
(
4
)
inverse
=
1
/
portion
query
=
<<~
SQL
SELECT (COUNT(*)*
#{
inverse
}
)::integer AS count
FROM
#{
model
.
table_name
}
TABLESAMPLE SYSTEM (
#{
portion
*
100
}
)
SQL
rows
=
ActiveRecord
::
Base
.
connection
.
select_all
(
query
)
Integer
(
rows
.
first
[
'count'
])
end
end
end
end
end
spec/lib/gitlab/database/count/exact_count_strategy_spec.rb
0 → 100644
浏览文件 @
474fd913
require
'spec_helper'
describe
Gitlab
::
Database
::
Count
::
ExactCountStrategy
do
before
do
create_list
(
:project
,
3
)
create
(
:identity
)
end
let
(
:models
)
{
[
Project
,
Identity
]
}
subject
{
described_class
.
new
(
models
).
count
}
describe
'#count'
do
it
'counts all models'
do
expect
(
models
).
to
all
(
receive
(
:count
).
and_call_original
)
expect
(
subject
).
to
eq
({
Project
=>
3
,
Identity
=>
1
})
end
end
describe
'.enabled?'
do
it
'is enabled for PostgreSQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
true
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
it
'is enabled for MySQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
false
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
end
end
spec/lib/gitlab/database/count/reltuples_count_strategy_spec.rb
0 → 100644
浏览文件 @
474fd913
require
'spec_helper'
describe
Gitlab
::
Database
::
Count
::
ReltuplesCountStrategy
do
before
do
create_list
(
:project
,
3
)
create
(
:identity
)
end
let
(
:models
)
{
[
Project
,
Identity
]
}
subject
{
described_class
.
new
(
models
).
count
}
describe
'#count'
,
:postgresql
do
context
'when reltuples is up to date'
do
before
do
ActiveRecord
::
Base
.
connection
.
execute
(
'ANALYZE projects'
)
ActiveRecord
::
Base
.
connection
.
execute
(
'ANALYZE identities'
)
end
it
'uses statistics to do the count'
do
models
.
each
{
|
model
|
expect
(
model
).
not_to
receive
(
:count
)
}
expect
(
subject
).
to
eq
({
Project
=>
3
,
Identity
=>
1
})
end
end
context
'insufficient permissions'
do
it
'returns an empty hash'
do
allow
(
ActiveRecord
::
Base
).
to
receive
(
:transaction
).
and_raise
(
PG
::
InsufficientPrivilege
)
expect
(
subject
).
to
eq
({})
end
end
end
describe
'.enabled?'
do
it
'is enabled for PostgreSQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
true
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
it
'is disabled for MySQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
false
)
expect
(
described_class
.
enabled?
).
to
be_falsey
end
end
end
spec/lib/gitlab/database/count/tablesample_count_strategy_spec.rb
0 → 100644
浏览文件 @
474fd913
require
'spec_helper'
describe
Gitlab
::
Database
::
Count
::
TablesampleCountStrategy
do
before
do
create_list
(
:project
,
3
)
create
(
:identity
)
end
let
(
:models
)
{
[
Project
,
Identity
]
}
let
(
:strategy
)
{
described_class
.
new
(
models
)
}
subject
{
strategy
.
count
}
describe
'#count'
,
:postgresql
do
let
(
:estimates
)
{
{
Project
=>
threshold
+
1
,
Identity
=>
threshold
-
1
}
}
let
(
:threshold
)
{
Gitlab
::
Database
::
Count
::
TablesampleCountStrategy
::
EXACT_COUNT_THRESHOLD
}
before
do
allow
(
strategy
).
to
receive
(
:size_estimates
).
with
(
check_statistics:
false
).
and_return
(
estimates
)
end
context
'for tables with an estimated small size'
do
it
'performs an exact count'
do
expect
(
Identity
).
to
receive
(
:count
).
and_call_original
expect
(
subject
).
to
include
({
Identity
=>
1
})
end
end
context
'for tables with an estimated large size'
do
it
'performs a tablesample count'
do
expect
(
Project
).
not_to
receive
(
:count
)
result
=
subject
expect
(
result
[
Project
]).
to
eq
(
3
)
end
end
context
'insufficient permissions'
do
it
'returns an empty hash'
do
allow
(
strategy
).
to
receive
(
:size_estimates
).
and_raise
(
PG
::
InsufficientPrivilege
)
expect
(
subject
).
to
eq
({})
end
end
end
describe
'.enabled?'
do
before
do
stub_feature_flags
(
tablesample_counts:
true
)
end
it
'is enabled for PostgreSQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
true
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
it
'is disabled for MySQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
false
)
expect
(
described_class
.
enabled?
).
to
be_falsey
end
end
end
spec/lib/gitlab/database/count_spec.rb
浏览文件 @
474fd913
...
...
@@ -56,128 +56,4 @@ describe Gitlab::Database::Count do
end
end
end
describe
Gitlab
::
Database
::
Count
::
ExactCountStrategy
do
subject
{
described_class
.
new
(
models
).
count
}
describe
'#count'
do
it
'counts all models'
do
expect
(
models
).
to
all
(
receive
(
:count
).
and_call_original
)
expect
(
subject
).
to
eq
({
Project
=>
3
,
Identity
=>
1
})
end
end
describe
'.enabled?'
do
it
'is enabled for PostgreSQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
true
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
it
'is enabled for MySQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
false
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
end
end
describe
Gitlab
::
Database
::
Count
::
ReltuplesCountStrategy
do
subject
{
described_class
.
new
(
models
).
count
}
describe
'#count'
,
:postgresql
do
context
'when reltuples is up to date'
do
before
do
ActiveRecord
::
Base
.
connection
.
execute
(
'ANALYZE projects'
)
ActiveRecord
::
Base
.
connection
.
execute
(
'ANALYZE identities'
)
end
it
'uses statistics to do the count'
do
models
.
each
{
|
model
|
expect
(
model
).
not_to
receive
(
:count
)
}
expect
(
subject
).
to
eq
({
Project
=>
3
,
Identity
=>
1
})
end
end
context
'insufficient permissions'
do
it
'returns an empty hash'
do
allow
(
ActiveRecord
::
Base
).
to
receive
(
:transaction
).
and_raise
(
PG
::
InsufficientPrivilege
)
expect
(
subject
).
to
eq
({})
end
end
end
describe
'.enabled?'
do
it
'is enabled for PostgreSQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
true
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
it
'is disabled for MySQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
false
)
expect
(
described_class
.
enabled?
).
to
be_falsey
end
end
end
describe
Gitlab
::
Database
::
Count
::
TablesampleCountStrategy
do
subject
{
strategy
.
count
}
let
(
:strategy
)
{
described_class
.
new
(
models
)
}
describe
'#count'
,
:postgresql
do
let
(
:estimates
)
{
{
Project
=>
threshold
+
1
,
Identity
=>
threshold
-
1
}
}
let
(
:threshold
)
{
Gitlab
::
Database
::
Count
::
TablesampleCountStrategy
::
EXACT_COUNT_THRESHOLD
}
before
do
allow
(
strategy
).
to
receive
(
:size_estimates
).
with
(
check_statistics:
false
).
and_return
(
estimates
)
end
context
'for tables with an estimated small size'
do
it
'performs an exact count'
do
expect
(
Identity
).
to
receive
(
:count
).
and_call_original
expect
(
subject
).
to
include
({
Identity
=>
1
})
end
end
context
'for tables with an estimated large size'
do
it
'performs a tablesample count'
do
expect
(
Project
).
not_to
receive
(
:count
)
result
=
subject
expect
(
result
[
Project
]).
to
eq
(
3
)
end
end
context
'insufficient permissions'
do
it
'returns an empty hash'
do
allow
(
strategy
).
to
receive
(
:size_estimates
).
and_raise
(
PG
::
InsufficientPrivilege
)
expect
(
subject
).
to
eq
({})
end
end
end
describe
'.enabled?'
do
before
do
stub_feature_flags
(
tablesample_counts:
true
)
end
it
'is enabled for PostgreSQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
true
)
expect
(
described_class
.
enabled?
).
to
be_truthy
end
it
'is disabled for MySQL'
do
allow
(
Gitlab
::
Database
).
to
receive
(
:postgresql?
).
and_return
(
false
)
expect
(
described_class
.
enabled?
).
to
be_falsey
end
end
end
end
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录