diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index cb37dc4cf6a7526ed15685b0f7589fb899c48d5d..fe88072ec42d97af32ab50031b4510e0bb1cae96 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -1067,7 +1067,7 @@ Default value: 0. ## query_profiler_real_time_period_ns {#query_profiler_real_time_period_ns} -Sets the period for a real clock timer of the [query profiler](../performance/sampling_query_profiler.md). Real clock timer counts wall-clock time. +Sets the period for a real clock timer of the [query profiler](../../operations/performance/sampling_query_profiler.md). Real clock timer counts wall-clock time. Possible values: diff --git a/docs/en/query_language/functions/date_time_functions.md b/docs/en/query_language/functions/date_time_functions.md index 408b58db6ca3166e8e897adb3461a297da74a243..3a3adba38a0c2ef56ceb4b9d40052e941d724c35 100644 --- a/docs/en/query_language/functions/date_time_functions.md +++ b/docs/en/query_language/functions/date_time_functions.md @@ -371,7 +371,7 @@ dateDiff('unit', startdate, enddate, [timezone]) - `startdate` — The first time value to compare. [Date](../../data_types/date.md) or [DateTime](../../data_types/datetime.md). - `enddate` — The second time value to compare. [Date](../../data_types/date.md) or [DateTime](../../data_types/datetime.md). -- `timezone` — Optional parameter. If specified, it is applied to both `startdate` and `enddate`. If not specified, timezones of `startdate` and `enddate` are used. If they are not the same, the result is unspecified. [Time Zone](../../data_types/datetime.md#time-zones). +- `timezone` — Optional parameter. If specified, it is applied to both `startdate` and `enddate`. If not specified, timezones of `startdate` and `enddate` are used. If they are not the same, the result is unspecified. **Returned value** diff --git a/docs/en/query_language/functions/string_search_functions.md b/docs/en/query_language/functions/string_search_functions.md index fecf21ec6b2a34f681ee3808f7286e55ce4ce16f..e68bd83b5bcafb06b508d705dd16ad023d95dd59 100644 --- a/docs/en/query_language/functions/string_search_functions.md +++ b/docs/en/query_language/functions/string_search_functions.md @@ -17,7 +17,7 @@ For a case-insensitive search, use the function `positionCaseInsensitiveUTF8`. ## multiSearchAllPositions {#multiSearchAllPositions} -The same as [position](#position) but returns `Array` of positions (in bytes) of the found corresponding substrings in the string. Positions are indexed starting from 1. +The same as [position](string_search_functions.md#position) but returns `Array` of positions (in bytes) of the found corresponding substrings in the string. Positions are indexed starting from 1. The search is performed on sequences of bytes without respect to string encoding and collation. diff --git a/docs/fa/operations/performance/sampling_query_profiler.md b/docs/fa/operations/performance/sampling_query_profiler.md new file mode 120000 index 0000000000000000000000000000000000000000..c55c58684ba104701e065b4b4d65940b0cc30066 --- /dev/null +++ b/docs/fa/operations/performance/sampling_query_profiler.md @@ -0,0 +1 @@ +../../../en/operations/performance/sampling_query_profiler.md \ No newline at end of file diff --git a/docs/ja/operations/performance/sampling_query_profiler.md b/docs/ja/operations/performance/sampling_query_profiler.md new file mode 120000 index 0000000000000000000000000000000000000000..c55c58684ba104701e065b4b4d65940b0cc30066 --- /dev/null +++ b/docs/ja/operations/performance/sampling_query_profiler.md @@ -0,0 +1 @@ +../../../en/operations/performance/sampling_query_profiler.md \ No newline at end of file diff --git a/docs/ru/operations/performance/sampling_query_profiler.md b/docs/ru/operations/performance/sampling_query_profiler.md new file mode 120000 index 0000000000000000000000000000000000000000..c55c58684ba104701e065b4b4d65940b0cc30066 --- /dev/null +++ b/docs/ru/operations/performance/sampling_query_profiler.md @@ -0,0 +1 @@ +../../../en/operations/performance/sampling_query_profiler.md \ No newline at end of file diff --git a/docs/ru/query_language/functions/introspection.md b/docs/ru/query_language/functions/introspection.md new file mode 120000 index 0000000000000000000000000000000000000000..b1a487e9c772ab7c9ecc3f28e164b46487717174 --- /dev/null +++ b/docs/ru/query_language/functions/introspection.md @@ -0,0 +1 @@ +../../../en/query_language/functions/introspection.md \ No newline at end of file diff --git a/docs/ru/query_language/table_functions/mysql.md b/docs/ru/query_language/table_functions/mysql.md index 2527b158d782cd7eea06692ae3e4a5a8c390a508..c5bebbbf740191953402a3012b8fe113c9ab4ae9 100644 --- a/docs/ru/query_language/table_functions/mysql.md +++ b/docs/ru/query_language/table_functions/mysql.md @@ -68,6 +68,6 @@ SELECT * FROM mysql('localhost:3306', 'test', 'test', 'bayonet', '123') ## Смотрите также - [Движок таблиц 'MySQL'](../../operations/table_engines/mysql.md) -- [Использование MySQL как источника данных для внешнего словаря](../dicts/external_dicts_dict_sources.md#dicts-external_dicts_dict_sources-mysql) +- [Использование MySQL как источника данных для внешнего словаря](../../query_language/dicts/external_dicts_dict_sources.md#dicts-external_dicts_dict_sources-mysql) [Оригинальная статья](https://clickhouse.tech/docs/ru/query_language/table_functions/mysql/) diff --git a/docs/toc_en.yml b/docs/toc_en.yml index 79f6224f9c295a2eee2afda556fa564bbb9a0a7b..8558216b15b545fff7504bbc93977e76b7d29094 100644 --- a/docs/toc_en.yml +++ b/docs/toc_en.yml @@ -199,7 +199,7 @@ nav: - 'Quotas': 'operations/quotas.md' - 'System Tables': 'operations/system_tables.md' - 'Optimizing Performance': - - 'Query Profiling': operations/performance/sampling_query_profiler.md + - 'Query Profiling': 'operations/performance/sampling_query_profiler.md' - 'Testing Hardware': 'operations/performance_test.md' - 'Server Configuration Parameters': - 'Introduction': 'operations/server_settings/index.md' diff --git a/docs/toc_fa.yml b/docs/toc_fa.yml index 9da4346dbbcf1905aaab84da4d15281f86e11a5a..bd1e84d590e58ac05bf2ac744a83da26d596106a 100644 --- a/docs/toc_fa.yml +++ b/docs/toc_fa.yml @@ -193,6 +193,8 @@ nav: - 'Configuration Files': 'operations/configuration_files.md' - 'Quotas': 'operations/quotas.md' - 'System Tables': 'operations/system_tables.md' + - 'Optimizing Performance': + - 'Query Profiling': 'operations/performance/sampling_query_profiler.md' - 'Testing Hardware': 'operations/performance_test.md' - 'Server Configuration Parameters': - 'Introduction': 'operations/server_settings/index.md' diff --git a/docs/toc_ja.yml b/docs/toc_ja.yml index 786559124f5deb97ea8d84f6362ec36753a604c4..f47bc06589044ad84e9c36880ad5162174a37196 100644 --- a/docs/toc_ja.yml +++ b/docs/toc_ja.yml @@ -197,6 +197,8 @@ nav: - 'Configuration Files': 'operations/configuration_files.md' - 'Quotas': 'operations/quotas.md' - 'System Tables': 'operations/system_tables.md' + - 'Optimizing Performance': + - 'Query Profiling': 'operations/performance/sampling_query_profiler.md' - 'Testing Hardware': 'operations/performance_test.md' - 'Server Configuration Parameters': - 'Introduction': 'operations/server_settings/index.md' diff --git a/docs/toc_ru.yml b/docs/toc_ru.yml index c2b7ab6961ab3989d8dab8bb89c84c24daf4d2b8..5999ac74b5641667b74a6208d384abfa60c04332 100644 --- a/docs/toc_ru.yml +++ b/docs/toc_ru.yml @@ -122,6 +122,7 @@ nav: - 'Функции для работы с географическими координатами': 'query_language/functions/geo.md' - 'Функции c Nullable аргументами': 'query_language/functions/functions_for_nulls.md' - 'Функции машинного обучения': 'query_language/functions/machine_learning_functions.md' + - 'Функции для интроспекции': 'query_language/functions/introspection.md' - 'Прочие функции': 'query_language/functions/other_functions.md' - 'Агрегатные функции': - 'Введение': 'query_language/agg_functions/index.md' @@ -197,6 +198,8 @@ nav: - 'Конфигурационные файлы': 'operations/configuration_files.md' - 'Квоты': 'operations/quotas.md' - 'Системные таблицы': 'operations/system_tables.md' + - 'Оптимизация производительности': + - 'Профилирование запросов': 'operations/performance/sampling_query_profiler.md' - 'Тестирование оборудования': 'operations/performance_test.md' - 'Конфигурационные параметры сервера': - 'Введение': 'operations/server_settings/index.md' diff --git a/docs/toc_zh.yml b/docs/toc_zh.yml index 8f33e5579d21cc1191986a2357e45e77581851d0..e85c6b50f27593cbdb3599d9273d0b788cc37adf 100644 --- a/docs/toc_zh.yml +++ b/docs/toc_zh.yml @@ -192,7 +192,9 @@ nav: - '配置文件': 'operations/configuration_files.md' - '配额': 'operations/quotas.md' - '系统表': 'operations/system_tables.md' - - 'Testing Hardware': 'operations/performance_test.md' + - '优化性能': + - '查询分析': 'operations/performance/sampling_query_profiler.md' + - '测试硬件': 'operations/performance_test.md' - 'Server参数配置': - '介绍': 'operations/server_settings/index.md' - 'Server参数说明': 'operations/server_settings/settings.md' diff --git a/docs/tools/output.md b/docs/tools/output.md new file mode 100644 index 0000000000000000000000000000000000000000..ec5674004cf68c2aba4211efe7a7e3a0afe7f255 --- /dev/null +++ b/docs/tools/output.md @@ -0,0 +1,207 @@ +What is ClickHouse? +=================== + +ClickHouse is a column-oriented database management system (DBMS) for +online analytical processing of queries (OLAP). + +In a "normal" row-oriented DBMS, data is stored in this order: + + Row WatchID JavaEnable Title GoodEvent EventTime + ----- ------------- ------------ -------------------- ----------- --------------------- + \#0 89354350662 1 Investor Relations 1 2016-05-18 05:19:20 + \#1 90329509958 0 Contact us 1 2016-05-18 08:10:20 + \#2 89953706054 1 Mission 1 2016-05-18 07:38:00 + \#N ... ... ... ... ... + +In other words, all the values related to a row are physically stored +next to each other. + +Examples of a row-oriented DBMS are MySQL, Postgres, and MS SQL Server. +{: .grey } + +In a column-oriented DBMS, data is stored like this: + + Row: \#0 \#1 \#2 \#N + ------------- --------------------- --------------------- --------------------- ----- + WatchID: 89354350662 90329509958 89953706054 ... + JavaEnable: 1 0 1 ... + Title: Investor Relations Contact us Mission ... + GoodEvent: 1 1 1 ... + EventTime: 2016-05-18 05:19:20 2016-05-18 08:10:20 2016-05-18 07:38:00 ... + +These examples only show the order that data is arranged in. The values +from different columns are stored separately, and data from the same +column is stored together. + +Examples of a column-oriented DBMS: Vertica, Paraccel (Actian Matrix and +Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB +(VectorWise and Actian Vector), LucidDB, SAP HANA, Google Dremel, Google +PowerDrill, Druid, and kdb+. {: .grey } + +Different orders for storing data are better suited to different +scenarios. The data access scenario refers to what queries are made, how +often, and in what proportion; how much data is read for each type of +query -- rows, columns, and bytes; the relationship between reading and +updating data; the working size of the data and how locally it is used; +whether transactions are used, and how isolated they are; requirements +for data replication and logical integrity; requirements for latency and +throughput for each type of query, and so on. + +The higher the load on the system, the more important it is to customize +the system set up to match the requirements of the usage scenario, and +the more fine grained this customization becomes. There is no system +that is equally well-suited to significantly different scenarios. If a +system is adaptable to a wide set of scenarios, under a high load, the +system will handle all the scenarios equally poorly, or will work well +for just one or few of possible scenarios. + +Key Properties of the OLAP scenario +----------------------------------- + +- The vast majority of requests are for read access. +- Data is updated in fairly large batches (\> 1000 rows), not by + single rows; or it is not updated at all. +- Data is added to the DB but is not modified. +- For reads, quite a large number of rows are extracted from the DB, + but only a small subset of columns. +- Tables are "wide," meaning they contain a large number of columns. +- Queries are relatively rare (usually hundreds of queries per server + or less per second). +- For simple queries, latencies around 50 ms are allowed. +- Column values are fairly small: numbers and short strings (for + example, 60 bytes per URL). +- Requires high throughput when processing a single query (up to + billions of rows per second per server). +- Transactions are not necessary. +- Low requirements for data consistency. +- There is one large table per query. All tables are small, except for + one. +- A query result is significantly smaller than the source data. In + other words, data is filtered or aggregated, so the result fits in a + single server's RAM. + +It is easy to see that the OLAP scenario is very different from other +popular scenarios (such as OLTP or Key-Value access). So it doesn't make +sense to try to use OLTP or a Key-Value DB for processing analytical +queries if you want to get decent performance. For example, if you try +to use MongoDB or Redis for analytics, you will get very poor +performance compared to OLAP databases. + +Why Column-Oriented Databases Work Better in the OLAP Scenario +-------------------------------------------------------------- + +Column-oriented databases are better suited to OLAP scenarios: they are +at least 100 times faster in processing most queries. The reasons are +explained in detail below, but the fact is easier to demonstrate +visually: + +**Row-oriented DBMS** + +![Row-oriented](images/row_oriented.gif#) + +**Column-oriented DBMS** + +![Column-oriented](images/column_oriented.gif#) + +See the difference? + +### Input/output + +1. For an analytical query, only a small number of table columns need + to be read. In a column-oriented database, you can read just the + data you need. For example, if you need 5 columns out of 100, you + can expect a 20-fold reduction in I/O. +2. Since data is read in packets, it is easier to compress. Data in + columns is also easier to compress. This further reduces the I/O + volume. +3. Due to the reduced I/O, more data fits in the system cache. + +For example, the query "count the number of records for each advertising +platform" requires reading one "advertising platform ID" column, which +takes up 1 byte uncompressed. If most of the traffic was not from +advertising platforms, you can expect at least 10-fold compression of +this column. When using a quick compression algorithm, data +decompression is possible at a speed of at least several gigabytes of +uncompressed data per second. In other words, this query can be +processed at a speed of approximately several billion rows per second on +a single server. This speed is actually achieved in practice. + +
+ +Example + + $ clickhouse-client + ClickHouse client version 0.0.52053. + Connecting to localhost:9000. + Connected to ClickHouse server version 0.0.52053. + + :) SELECT CounterID, count() FROM hits GROUP BY CounterID ORDER BY count() DESC LIMIT 20 + + SELECT + CounterID, + count() + FROM hits + GROUP BY CounterID + ORDER BY count() DESC + LIMIT 20 + + ┌─CounterID─┬──count()─┐ + │ 114208 │ 56057344 │ + │ 115080 │ 51619590 │ + │ 3228 │ 44658301 │ + │ 38230 │ 42045932 │ + │ 145263 │ 42042158 │ + │ 91244 │ 38297270 │ + │ 154139 │ 26647572 │ + │ 150748 │ 24112755 │ + │ 242232 │ 21302571 │ + │ 338158 │ 13507087 │ + │ 62180 │ 12229491 │ + │ 82264 │ 12187441 │ + │ 232261 │ 12148031 │ + │ 146272 │ 11438516 │ + │ 168777 │ 11403636 │ + │ 4120072 │ 11227824 │ + │ 10938808 │ 10519739 │ + │ 74088 │ 9047015 │ + │ 115079 │ 8837972 │ + │ 337234 │ 8205961 │ + └───────────┴──────────┘ + + 20 rows in set. Elapsed: 0.153 sec. Processed 1.00 billion rows, 4.00 GB (6.53 billion rows/s., 26.10 GB/s.) + + :) + +
+ +### CPU + +Since executing a query requires processing a large number of rows, it +helps to dispatch all operations for entire vectors instead of for +separate rows, or to implement the query engine so that there is almost +no dispatching cost. If you don't do this, with any half-decent disk +subsystem, the query interpreter inevitably stalls the CPU. It makes +sense to both store data in columns and process it, when possible, by +columns. + +There are two ways to do this: + +1. A vector engine. All operations are written for vectors, instead of + for separate values. This means you don't need to call operations + very often, and dispatching costs are negligible. Operation code + contains an optimized internal cycle. + +2. Code generation. The code generated for the query has all the + indirect calls in it. + +This is not done in "normal" databases, because it doesn't make sense +when running simple queries. However, there are exceptions. For example, +MemSQL uses code generation to reduce latency when processing SQL +queries. (For comparison, analytical DBMSs require optimization of +throughput, not latency.) + +Note that for CPU efficiency, the query language must be declarative +(SQL or MDX), or at least a vector (J, K). The query should only contain +implicit loops, allowing for optimization. + +[Original article](https://clickhouse.tech/docs/en/) diff --git a/docs/tools/translate.py b/docs/tools/translate.py new file mode 100755 index 0000000000000000000000000000000000000000..621fc37af19e82e2a8504106712a4a9545993617 --- /dev/null +++ b/docs/tools/translate.py @@ -0,0 +1,21 @@ +#!/usr/bin/env python + +from __future__ import print_function +import sys +import pprint + +import googletrans +import pandocfilters + +translator = googletrans.Translator() + +def translate(key, value, format, _): + if key == 'Str': + print(value.encode('utf8'), file=sys.stderr) + return + [meta, contents] = value + cls = getattr(pandocfilters, key) + return cls(meta, translator.translate(contents, dest='es')) + +if __name__ == "__main__": + pandocfilters.toJSONFilter(translate) diff --git a/docs/zh/operations/performance/sampling_query_profiler.md b/docs/zh/operations/performance/sampling_query_profiler.md new file mode 120000 index 0000000000000000000000000000000000000000..c55c58684ba104701e065b4b4d65940b0cc30066 --- /dev/null +++ b/docs/zh/operations/performance/sampling_query_profiler.md @@ -0,0 +1 @@ +../../../en/operations/performance/sampling_query_profiler.md \ No newline at end of file