Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
2dot5
ClickHouse
提交
af1129a9
C
ClickHouse
项目概览
2dot5
/
ClickHouse
通知
3
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
C
ClickHouse
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
af1129a9
编写于
6月 17, 2019
作者:
B
BayoNet
提交者:
Ivan Blinkov
6月 17, 2019
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
DOCAPI-6423: simpleLinearRegression function docs (#5484)
上级
4e7230f8
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
187 addition
and
57 deletion
+187
-57
docs/en/getting_started/example_datasets/ontime.md
docs/en/getting_started/example_datasets/ontime.md
+91
-26
docs/en/query_language/agg_functions/reference.md
docs/en/query_language/agg_functions/reference.md
+3
-3
docs/ru/getting_started/example_datasets/ontime.md
docs/ru/getting_started/example_datasets/ontime.md
+91
-26
docs/ru/query_language/agg_functions/reference.md
docs/ru/query_language/agg_functions/reference.md
+2
-2
未找到文件。
docs/en/getting_started/example_datasets/ontime.md
浏览文件 @
af1129a9
...
...
@@ -163,31 +163,54 @@ clickhouse-client --query "select count(*) from datasets.ontime"
Q0.
```
sql
select
avg
(
c1
)
from
(
select
Year
,
Month
,
count
(
*
)
as
c1
from
ontime
group
by
Year
,
Month
);
SELECT
avg
(
c1
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
c1
FROM
ontime
GROUP
BY
Year
,
Month
);
```
Q1. The number of flights per day from the year 2000 to 2008
```
sql
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
;
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
;
```
Q2. The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008
```
sql
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
;
```
Q3. The number of delays by airport for 2000-2008
```
sql
SELECT
Origin
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Origin
ORDER
BY
c
DESC
LIMIT
10
SELECT
Origin
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Origin
ORDER
BY
c
DESC
LIMIT
10
;
```
Q4. The number of delays by carrier for 2007
```
sql
SELECT
Carrier
,
count
(
*
)
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
count
(
*
)
DESC
SELECT
Carrier
,
count
(
*
)
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
count
(
*
)
DESC
;
```
Q5. The percentage of delays by carrier for 2007
...
...
@@ -219,7 +242,11 @@ ORDER BY c3 DESC;
Better version of the same query:
```
sql
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
Carrier
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
Carrier
```
Q6. The previous request for a broader range of years, 2000-2008
...
...
@@ -233,7 +260,7 @@ FROM
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
)
ANY
INNER
JOIN
...
...
@@ -242,7 +269,7 @@ ANY INNER JOIN
Carrier
,
count
(
*
)
AS
c2
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
)
USING
Carrier
ORDER
BY
c3
DESC
;
...
...
@@ -251,7 +278,11 @@ ORDER BY c3 DESC;
Better version of the same query:
```
sql
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
ORDER
BY
Carrier
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
ORDER
BY
Carrier
;
```
Q7. Percentage of flights delayed for more than 10 minutes, by year
...
...
@@ -275,41 +306,51 @@ ANY INNER JOIN
from
ontime
GROUP
BY
Year
)
USING
(
Year
)
ORDER
BY
Year
ORDER
BY
Year
;
```
Better version of the same query:
```
sql
SELECT
Year
,
avg
(
DepDelay
>
10
)
FROM
ontime
GROUP
BY
Year
ORDER
BY
Year
SELECT
Year
,
avg
(
DepDelay
>
10
)
FROM
ontime
GROUP
BY
Year
ORDER
BY
Year
;
```
Q8. The most popular destinations by the number of directly connected cities for various year ranges
```
sql
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
FROM
ontime
WHERE
Year
>=
2000
and
Year
<=
2010
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
F
ROM
ontime
WHERE
Year
>=
2000
and
Year
<=
2010
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
```
Q9.
```
sql
select
Year
,
count
(
*
)
as
c1
from
ontime
group
by
Year
;
SELECT
Year
,
count
(
*
)
AS
c1
FROM
ontime
GROUP
BY
Year
;
```
Q10.
```
sql
select
min
(
Year
),
max
(
Year
),
Carrier
,
count
(
*
)
as
cnt
,
sum
(
ArrDelayMinutes
>
30
)
as
flights_delayed
,
round
(
sum
(
ArrDelayMinutes
>
30
)
/
count
(
*
),
2
)
as
rate
SELECT
min
(
Year
),
max
(
Year
),
Carrier
,
count
(
*
)
AS
cnt
,
sum
(
ArrDelayMinutes
>
30
)
AS
flights_delayed
,
round
(
sum
(
ArrDelayMinutes
>
30
)
/
count
(
*
),
2
)
AS
rate
FROM
ontime
WHERE
DayOfWeek
not
in
(
6
,
7
)
and
OriginState
not
in
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
and
DestState
not
in
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
and
FlightDate
<
'2010-01-01'
DayOfWeek
NOT
IN
(
6
,
7
)
AND
OriginState
NOT
IN
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
AND
DestState
NOT
IN
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
AND
FlightDate
<
'2010-01-01'
GROUP
by
Carrier
HAVING
cnt
>
100000
and
max
(
Year
)
>
1990
HAVING
cnt
>
100000
and
max
(
Year
)
>
1990
ORDER
by
rate
DESC
LIMIT
1000
;
```
...
...
@@ -317,15 +358,39 @@ LIMIT 1000;
Bonus:
```
sql
SELECT
avg
(
cnt
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
cnt
FROM
ontime
WHERE
DepDel15
=
1
GROUP
BY
Year
,
Month
)
SELECT
avg
(
cnt
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
cnt
FROM
ontime
WHERE
DepDel15
=
1
GROUP
BY
Year
,
Month
);
select
avg
(
c1
)
from
(
select
Year
,
Month
,
count
(
*
)
as
c1
from
ontime
group
by
Year
,
Month
)
SELECT
avg
(
c1
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
c1
FROM
ontime
GROUP
BY
Year
,
Month
);
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
FROM
ontime
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
FROM
ontime
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
SELECT
OriginCityName
,
DestCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
,
DestCityName
ORDER
BY
c
DESC
LIMIT
10
;
SELECT
OriginCityName
,
DestCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
,
DestCityName
ORDER
BY
c
DESC
LIMIT
10
;
SELECT
OriginCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
ORDER
BY
c
DESC
LIMIT
10
;
SELECT
OriginCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
ORDER
BY
c
DESC
LIMIT
10
;
```
This performance test was created by Vadim Tkachenko. See:
...
...
docs/en/query_language/agg_functions/reference.md
浏览文件 @
af1129a9
...
...
@@ -839,12 +839,12 @@ simpleLinearRegression(x, y)
Parameters:
-
`x`
— Column with
values of dependent variable
.
-
`y`
— Column with explanatory variable.
-
`x`
— Column with
dependent variable values
.
-
`y`
— Column with explanatory variable
values
.
Returned values:
Parameter
s
`(a, b)`
of the resulting line
`y = a*x + b`
.
Constant
s
`(a, b)`
of the resulting line
`y = a*x + b`
.
**Examples**
...
...
docs/ru/getting_started/example_datasets/ontime.md
浏览文件 @
af1129a9
...
...
@@ -163,31 +163,54 @@ clickhouse-client --query "SELECT COUNT(*) FROM datasets.ontime"
Q0.
```
sql
select
avg
(
c1
)
from
(
select
Year
,
Month
,
count
(
*
)
as
c1
from
ontime
group
by
Year
,
Month
);
SELECT
avg
(
c1
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
c1
FROM
ontime
GROUP
BY
Year
,
Month
);
```
Q1. Количество полетов в день с 2000 по 2008 года
```
sql
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
;
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
;
```
Q2. Количество полетов, задержанных более чем на 10 минут, с группировкой по дням неделе, за 2000-2008 года
```
sql
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
SELECT
DayOfWeek
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
DayOfWeek
ORDER
BY
c
DESC
;
```
Q3. Количество задержек по аэропортам за 2000-2008
```
sql
SELECT
Origin
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Origin
ORDER
BY
c
DESC
LIMIT
10
SELECT
Origin
,
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Origin
ORDER
BY
c
DESC
LIMIT
10
;
```
Q4. Количество задержек по перевозчикам за 2007 год
```
sql
SELECT
Carrier
,
count
(
*
)
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
count
(
*
)
DESC
SELECT
Carrier
,
count
(
*
)
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
count
(
*
)
DESC
;
```
Q5. Процент задержек по перевозчикам за 2007 год
...
...
@@ -219,7 +242,11 @@ ORDER BY c3 DESC;
Более оптимальная версия того же запроса:
```
sql
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
Carrier
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
=
2007
GROUP
BY
Carrier
ORDER
BY
Carrier
```
Q6. Предыдущий запрос за более широкий диапазон лет, 2000-2008
...
...
@@ -233,7 +260,7 @@ FROM
count
(
*
)
AS
c
FROM
ontime
WHERE
DepDelay
>
10
AND
Year
>=
2000
AND
Year
<=
2008
AND
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
)
ANY
INNER
JOIN
...
...
@@ -242,7 +269,7 @@ ANY INNER JOIN
Carrier
,
count
(
*
)
AS
c2
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
)
USING
Carrier
ORDER
BY
c3
DESC
;
...
...
@@ -251,7 +278,11 @@ ORDER BY c3 DESC;
Более оптимальная версия того же запроса:
```
sql
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
ORDER
BY
Carrier
SELECT
Carrier
,
avg
(
DepDelay
>
10
)
*
100
AS
c3
FROM
ontime
WHERE
Year
>=
2000
AND
Year
<=
2008
GROUP
BY
Carrier
ORDER
BY
Carrier
;
```
Q7. Процент полетов, задержанных на более 10 минут, в разбивке по годам
...
...
@@ -275,41 +306,51 @@ ANY INNER JOIN
from
ontime
GROUP
BY
Year
)
USING
(
Year
)
ORDER
BY
Year
ORDER
BY
Year
;
```
Более оптимальная версия того же запроса:
```
sql
SELECT
Year
,
avg
(
DepDelay
>
10
)
FROM
ontime
GROUP
BY
Year
ORDER
BY
Year
SELECT
Year
,
avg
(
DepDelay
>
10
)
FROM
ontime
GROUP
BY
Year
ORDER
BY
Year
;
```
Q8. Самые популярные направления по количеству напрямую соединенных городов для различных диапазонов лет
```
sql
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
FROM
ontime
WHERE
Year
>=
2000
and
Year
<=
2010
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
F
ROM
ontime
WHERE
Year
>=
2000
and
Year
<=
2010
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
```
Q9.
```
sql
select
Year
,
count
(
*
)
as
c1
from
ontime
group
by
Year
;
SELECT
Year
,
count
(
*
)
AS
c1
FROM
ontime
GROUP
BY
Year
;
```
Q10.
```
sql
select
min
(
Year
),
max
(
Year
),
Carrier
,
count
(
*
)
as
cnt
,
sum
(
ArrDelayMinutes
>
30
)
as
flights_delayed
,
round
(
sum
(
ArrDelayMinutes
>
30
)
/
count
(
*
),
2
)
as
rate
SELECT
min
(
Year
),
max
(
Year
),
Carrier
,
count
(
*
)
AS
cnt
,
sum
(
ArrDelayMinutes
>
30
)
AS
flights_delayed
,
round
(
sum
(
ArrDelayMinutes
>
30
)
/
count
(
*
),
2
)
AS
rate
FROM
ontime
WHERE
DayOfWeek
not
in
(
6
,
7
)
and
OriginState
not
in
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
and
DestState
not
in
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
and
FlightDate
<
'2010-01-01'
DayOfWeek
NOT
IN
(
6
,
7
)
AND
OriginState
NOT
IN
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
AND
DestState
NOT
IN
(
'AK'
,
'HI'
,
'PR'
,
'VI'
)
AND
FlightDate
<
'2010-01-01'
GROUP
by
Carrier
HAVING
cnt
>
100000
and
max
(
Year
)
>
1990
HAVING
cnt
>
100000
and
max
(
Year
)
>
1990
ORDER
by
rate
DESC
LIMIT
1000
;
```
...
...
@@ -317,15 +358,39 @@ LIMIT 1000;
Бонус:
```
sql
SELECT
avg
(
cnt
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
cnt
FROM
ontime
WHERE
DepDel15
=
1
GROUP
BY
Year
,
Month
)
SELECT
avg
(
cnt
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
cnt
FROM
ontime
WHERE
DepDel15
=
1
GROUP
BY
Year
,
Month
);
select
avg
(
c1
)
from
(
select
Year
,
Month
,
count
(
*
)
as
c1
from
ontime
group
by
Year
,
Month
)
SELECT
avg
(
c1
)
FROM
(
SELECT
Year
,
Month
,
count
(
*
)
AS
c1
FROM
ontime
GROUP
BY
Year
,
Month
);
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
FROM
ontime
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
SELECT
DestCityName
,
uniqExact
(
OriginCityName
)
AS
u
FROM
ontime
GROUP
BY
DestCityName
ORDER
BY
u
DESC
LIMIT
10
;
SELECT
OriginCityName
,
DestCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
,
DestCityName
ORDER
BY
c
DESC
LIMIT
10
;
SELECT
OriginCityName
,
DestCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
,
DestCityName
ORDER
BY
c
DESC
LIMIT
10
;
SELECT
OriginCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
ORDER
BY
c
DESC
LIMIT
10
;
SELECT
OriginCityName
,
count
()
AS
c
FROM
ontime
GROUP
BY
OriginCityName
ORDER
BY
c
DESC
LIMIT
10
;
```
Данный тест производительности был создан Вадимом Ткаченко, статьи по теме:
...
...
docs/ru/query_language/agg_functions/reference.md
浏览文件 @
af1129a9
...
...
@@ -597,12 +597,12 @@ simpleLinearRegression(x, y)
Параметры:
-
`x`
—
С
толбец со значениями зависимой переменной.
-
`x`
—
с
толбец со значениями зависимой переменной.
-
`y`
— столбец со значениями наблюдаемой переменной.
Возвращаемые значения:
Параметры
`(a, b)`
результирующей прямой
`x = a*y
+ b`
.
Константы
`(a, b)`
результирующей прямой
`y = a*x
+ b`
.
**Примеры**
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录