提交 d6c43775 编写于 作者: I Ivan Blinkov

revert some more harmful patches

上级 702885eb
......@@ -2,11 +2,11 @@
# External Dictionaries
You can add your own dictionaries from various data sources. The data source for a dictionary can be a local text or executable file, an HTTP(s) resource, or another DBMS. For more information, see "[Sources of external dictionaries](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources)".
You can add your own dictionaries from various data sources. The data source for a dictionary can be a local text or executable file, an HTTP(s) resource, or another DBMS. For more information, see "[Sources for external dictionaries](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources)".
ClickHouse:
- Fully or partially stores dictionaries in RAM.
> - Fully or partially stores dictionaries in RAM.
- Periodically updates dictionaries and dynamically loads missing values. In other words, dictionaries can be loaded dynamically.
The configuration of external dictionaries is located in one or more files. The path to the configuration is specified in the [dictionaries_config](../../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.
......@@ -37,14 +37,7 @@ The dictionary config file has the following format:
You can [configure](external_dicts_dict.md#dicts-external_dicts_dict) any number of dictionaries in the same file. The file format is preserved even if there is only one dictionary (i.e. `<yandex><dictionary> <!--configuration -> </dictionary></yandex>` ).
> You can convert values ​​for a small dictionary by describing it in a `SELECT` query (see the [transform](../functions/other_functions.md#other_functions-transform) function). This functionality is not related to external dictionaries.
See also:
- [Configuring an external dictionary](external_dicts_dict.md#dicts-external_dicts_dict)
- [Storing dictionaries in memory](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout)
- [Updating dictionaries](external_dicts_dict_lifetime#dicts-external_dicts_dict_lifetime)
- [Sources of external dictionaries](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources)
- [Dictionary key and fields](external_dicts_dict_structure.md#dicts-external_dicts_dict_dict_structure)
- [Functions for working with external dictionaries](../functions/ext_dict_functions.md#ext_dict_functions)
See also "[Functions for working with external dictionaries](../functions/ext_dict_functions.md#ext_dict_functions)".
!!! attention
You can convert values for a small dictionary by describing it in a `SELECT` query (see the [transform](../functions/other_functions.md#other_functions-transform) function). This functionality is not related to external dictionaries.
......@@ -4,9 +4,9 @@
There are a [variety of ways](#dicts-external_dicts_dict_layout-manner) to store dictionaries in memory.
We recommend [flat](#dicts-external_dicts_dict_layout-flat), [hashed](#dicts-external_dicts_dict_layout-hashed) and [complex_key_hashed](#dicts-external_dicts_dict_layout-complex_key_hashed). which provide optimal processing speed.
We recommend [flat](#dicts-external_dicts_dict_layout-flat), [hashed](#dicts-external_dicts_dict_layout-hashed)and[complex_key_hashed](#dicts-external_dicts_dict_layout-complex_key_hashed). which provide optimal processing speed.
Caching is not recommended because of potentially poor performance and difficulties in selecting optimal parameters. Read more in the section " [cache](#dicts-external_dicts_dict_layout-cache)".
Caching is not recommended because of potentially poor performance and difficulties in selecting optimal parameters. Read more in the section "[cache](#dicts-external_dicts_dict_layout-cache)".
There are several ways to improve dictionary performance:
......@@ -54,7 +54,7 @@ The configuration looks like this:
The dictionary is completely stored in memory in the form of flat arrays. How much memory does the dictionary use? The amount is proportional to the size of the largest key (in space used).
The dictionary key has the `UInt64` type and the value is limited to 500,000. If a larger key is discovered when creating the dictionary, ClickHouse throws an exception and does not create the dictionary.
The dictionary key has the ` UInt64` type and the value is limited to 500,000. If a larger key is discovered when creating the dictionary, ClickHouse throws an exception and does not create the dictionary.
All types of sources are supported. When updating, data (from a file or from a table) is read in its entirety.
......@@ -72,7 +72,7 @@ Configuration example:
### hashed
The dictionary is completely stored in memory in the form of a hash table. The dictionary can contain any number of elements with any identifiers. In practice, the number of keys can reach tens of millions of items.
The dictionary is completely stored in memory in the form of a hash table. The dictionary can contain any number of elements with any identifiers In practice, the number of keys can reach tens of millions of items.
All types of sources are supported. When updating, data (from a file or from a table) is read in its entirety.
......@@ -88,7 +88,7 @@ Configuration example:
### complex_key_hashed
This type of storage is for use with complex [keys](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure). Similar to `hashed`.
This type of storage is for use with composite [keys](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure). Similar to `hashed`.
Configuration example:
......@@ -140,13 +140,15 @@ Example:
To work with these dictionaries, you need to pass an additional date argument to the `dictGetT` function:
dictGetT('dict_name', 'attr_name', id, date)
```
dictGetT('dict_name', 'attr_name', id, date)
```
This function returns the value for the specified `id`s and the date range that includes the passed date.
Details of the algorithm:
- If the `id` is not found or a range is not found for the `id`, it returns the default value for the dictionary.
- If the ` id` is not found or a range is not found for the ` id`, it returns the default value for the dictionary.
- If there are overlapping ranges, you can use any.
- If the range delimiter is `NULL` or an invalid date (such as 1900-01-01 or 2039-01-01), the range is left open. The range can be open on both sides.
......@@ -191,11 +193,11 @@ The dictionary is stored in a cache that has a fixed number of cells. These cell
When searching for a dictionary, the cache is searched first. For each block of data, all keys that are not found in the cache or are outdated are requested from the source using ` SELECT attrs... FROM db.table WHERE id IN (k1, k2, ...)`. The received data is then written to the cache.
For cache dictionaries, the expiration ([lifetime](external_dicts_dict_lifetime.md#dicts-external_dicts_dict_lifetime)) of data in the cache can be set. If more time than `lifetime` has passed since loading the data in a cell, the cell's value is not used, and it is re-requested the next time it needs to be used.
For cache dictionaries, the expiration [lifetime](external_dicts_dict_lifetime.md#dicts-external_dicts_dict_lifetime) of data in the cache can be set. If more time than `lifetime` has passed since loading the data in a cell, the cell's value is not used, and it is re-requested the next time it needs to be used.
This is the least effective of all the ways to store dictionaries. The speed of the cache depends strongly on correct settings and the usage scenario. A cache type dictionary performs well only when the hit rates are high enough (recommended 99% and higher). You can view the average hit rate in the `system.dictionaries` table.
To improve cache performance, use a subquery with `LIMIT`, and call the function with the dictionary externally.
To improve cache performance, use a subquery with ` LIMIT`, and call the function with the dictionary externally.
Supported [sources](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources): MySQL, ClickHouse, executable, HTTP.
......@@ -217,14 +219,14 @@ Set a large enough cache size. You need to experiment to select the number of ce
3. Assess memory consumption using the `system.dictionaries` table.
4. Increase or decrease the number of cells until the required memory consumption is reached.
!!! Warning:
Do not use ClickHouse as a source, because it is slow to process queries with random reads.
!!! warning
Do not use ClickHouse as a source, because it is slow to process queries with random reads.
<a name="dicts-external_dicts_dict_layout-complex_key_cache"></a>
### complex_key_cache
This type of storage is for use with complex [keys](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure). Similar to `cache`.
This type of storage is for use with composite [keys](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure). Similar to `cache`.
<a name="dicts-external_dicts_dict_layout-ip_trie"></a>
......@@ -273,7 +275,7 @@ Example:
...
```
The key must have only one `String` type attribute that contains an allowed IP prefix. Other types are not supported yet.
The key must have only one String type attribute that contains an allowed IP prefix. Other types are not supported yet.
For queries, you must use the same functions (`dictGetT` with a tuple) as for dictionaries with composite keys:
......@@ -281,7 +283,7 @@ For queries, you must use the same functions (`dictGetT` with a tuple) as for di
dictGetT('dict_name', 'attr_name', tuple(ip))
```
The function accepts either `UInt32` for IPv4, or `FixedString(16)` for IPv6:
The function takes either `UInt32` for IPv4, or `FixedString(16)` for IPv6:
```
dictGetString('prefix', 'asn', tuple(IPv6StringToNum('2001:db8::1')))
......@@ -289,5 +291,4 @@ dictGetString('prefix', 'asn', tuple(IPv6StringToNum('2001:db8::1')))
Other types are not supported yet. The function returns the attribute for the prefix that corresponds to this IP address. If there are overlapping prefixes, the most specific one is returned.
Data is stored in a `trie` . It must completely fit into RAM.
Data is stored in a `trie`. It must completely fit into RAM.
......@@ -39,7 +39,7 @@ ClickHouse supports the following types of keys:
A structure can contain either `<id>` or `<key>` .
!!! note
!!! warning
The key doesn't need to be defined separately in attributes.
### Numeric Key
......@@ -112,7 +112,6 @@ Configuration fields:
- `type` – The column type. Sets the method for interpreting data in the source. For example, for MySQL, the field might be `TEXT`, `VARCHAR`, or `BLOB` in the source table, but it can be uploaded as `String`.
- `null_value` – The default value for a non-existing element. In the example, it is an empty string.
- `expression` – The attribute can be an expression. The tag is not required.
- `hierarchical` – Hierarchical support. Mirrored to the parent identifier. By default, `false`.
- `injective` – Whether the `id -> attribute` image is injective. If `true`, then you can optimize the ` GROUP BY` clause. By default, `false`.
- `hierarchical` – Hierarchical support. Mirrored to the parent identifier. By default, ` false`.
- `injective` – Whether the `id -> attribute` image is injective. If ` true`, then you can optimize the ` GROUP BY` clause. By default, `false`.
- `is_object_id` – Whether the query is executed for a MongoDB document by `ObjectID`.
<a name="higher_order_functions"></a>
# Higher-order functions
## `->` operator, lambda(params, expr) function
......@@ -91,9 +89,9 @@ SELECT arrayCumSum([1, 1, 1, 1]) AS res
### arraySort(\[func,\] arr1, ...)
Returns the `arr1` array sorted in ascending order. If `func` is set, the sort order is determined by the result of the `func` function on the elements of the array or arrays.
Returns an array as result of sorting the elements of `arr1` in ascending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays)
To improve sorting efficiency, we use the [Schwartzian Transform](https://en.wikipedia.org/wiki/%D0%9F%D1%80%D0%B5%D0%BE%D0%B1%D1%80%D0%B0%D0%B7%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%A8%D0%B2%D0%B0%D1%80%D1%86%D0%B0).
The [Schwartzian transform](https://en.wikipedia.org/wiki/Schwartzian_transform) is used to impove sorting efficiency.
Example:
......@@ -109,5 +107,9 @@ SELECT arraySort((x, y) -> y, ['hello', 'world'], [2, 1]);
### arrayReverseSort(\[func,\] arr1, ...)
Returns the `arr1` array sorted in descending order. If `func` is set, the sort order is determined by the result of the `func` function on the elements of the array or arrays.
Returns an array as result of sorting the elements of `arr1` in descending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays)
<a name="ym_dict_functions"></a>
# Functions for working with Yandex.Metrica dictionaries
In order for the functions below to work, the server config must specify the paths and addresses for getting all the Yandex.Metrica dictionaries. The dictionaries are loaded at the first call of any of these functions. If the reference lists can't be loaded, an exception is thrown.
......@@ -23,9 +21,9 @@ All functions for working with regions have an optional argument at the end –
Example:
```text
regionToCountry (RegionID) — Uses the default dictionary: /opt/geo/regions_hierarchy.txt.
regionToCountry (RegionID, '') — Uses the default dictionary: /opt/geo/regions_hierarchy.txt.
regionToCountry (RegionID, 'ua') — Uses the dictionary for the ua key: /opt/geo/regions_hierarchy_ua.txt.
regionToCountry(RegionID) – Uses the default dictionary: /opt/geo/regions_hierarchy.txt
regionToCountry(RegionID, '') – Uses the default dictionary: /opt/geo/regions_hierarchy.txt
regionToCountry(RegionID, 'ua') – Uses the dictionary for the 'ua' key: /opt/geo/regions_hierarchy_ua.txt
```
### regionToCity(id[, geobase])
......@@ -45,20 +43,20 @@ LIMIT 15
```text
┌─regionToName(regionToArea(toUInt32(number), \'ua\'))─┐
│ │
│ Moscow and Moscow region │
│ St. Petersburg and Leningrad region │
│ Belogorod region
│ Ivanovo region
│ Kaluga region │
│ Kostroma region │
│ Kursk region │
│ Lipetsk region │
│ Oryol region
│ Ryazan region │
│ Smolensk region │
│ Tambov region │
│ Tver region │
│ Tula region │
│ Moscow and Moscow region
│ St. Petersburg and Leningrad region
│ Belgorod region
│ Ivanovsk region
│ Kaluga region
│ Kostroma region
│ Kursk region
│ Lipetsk region
│ Orlov region
│ Ryazan region
│ Smolensk region
│ Tambov region
│ Tver region
│ Tula region
└──────────────────────────────────────────────────────┘
```
......@@ -75,20 +73,20 @@ LIMIT 15
```text
┌─regionToName(regionToDistrict(toUInt32(number), \'ua\'))─┐
│ │
│ Central Federal District
│ Northwest Federal District
│ Southern Federal District
│ North Caucasian Federal District
│ Privolzhsky Federal District
│ Ural Federal District
│ Siberian Federal District
│ Far East Federal District
│ Scotland │
│ Faroe Islands │
│ Flemish region │
│ Brussels capital region │
│ Walloon
│ Federation of Bosnia and Herzegovina
│ Central federal district
│ Northwest federal district
│ South federal district
│ North Caucases federal district
│ Privolga federal district
│ Ural federal district
│ Siberian federal district
│ Far East federal district
│ Scotland
│ Faroe Islands
│ Flemish region
│ Brussels capital region
│ Wallonia
│ Federation of Bosnia and Herzegovina │
└──────────────────────────────────────────────────────────┘
```
......
......@@ -6,10 +6,9 @@ This query is exactly the same as `CREATE`, but
- instead of the word `CREATE` it uses the word `ATTACH`.
- The query doesn't create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table to the server.
After executing an ATTACH query, the server will know about the existence of the table.
After executing an `ATTACH` query, the server will know about the existence of the table.
If the table was previously detached (`DETACH`), meaning that its structure is known, you can use shorthand without defining the structure.
If the table was previously detached (``DETACH``), meaning that its structure is known, you can use shorthand without defining the structure.
```sql
ATTACH TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
......@@ -176,8 +175,8 @@ Supported only by `*MergeTree` engines, in which this query initializes a non-sc
If you specify a `PARTITION`, only the specified partition will be optimized.
If you specify `FINAL`, optimization will be performed even when all the data is already in one part.
!!! Important:
The OPTIMIZE query can't fix the cause of the "Too many parts" error.
!!! warning
OPTIMIZE can't fix the "Too many parts" error.
## KILL QUERY
......@@ -194,7 +193,7 @@ The queries to terminate are selected from the system.processes table using the
Examples:
```sql
-- Terminates all queries with the specified query_id:
-- Forcibly terminates all queries with the specified query_id:
KILL QUERY WHERE query_id='2-857d-4a57-9ee0-327da5d60a90'
-- Synchronously terminates all queries run by 'username':
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册