未验证 提交 42f72d52 编写于 作者: B BayoNet 提交者: GitHub

DOCS-323: LowCardinality (#11060)

* CLICKHOUSEDOCS-323: The first version of the description.

* CLICKHOUSEDOCS-323: Update.

* CLICKHOUSEDOCS-323: toLowCardinality, low_cardinality_max_dictionary_size

* CLICKHOUSEDOCS-323: Added descriptions of low_cardinality_use_single_dictionary_for_part, low_cardinality_allow_in_native_format, allow_suspicious_low_cardinality_types.

* Update docs/en/sql-reference/data-types/lowcardinality.md
Co-authored-by: NIvan Blinkov <github@blinkov.ru>

* Update docs/en/sql-reference/data-types/lowcardinality.md
Co-authored-by: NIvan Blinkov <github@blinkov.ru>

* Update docs/en/sql-reference/functions/type-conversion-functions.md
Co-authored-by: NIvan Blinkov <github@blinkov.ru>

* CLICKHOUSEDOCS-323: Updated by comments.

* CLICKHOUSEDOCS-323: Updated by comments.

* CLICKHOUSEDOCS-323: Fixed some grammar.
Co-authored-by: NSergei Shtykov <bayonet@yandex-team.ru>
Co-authored-by: NIvan Blinkov <github@blinkov.ru>
上级 6281dd68
......@@ -1265,4 +1265,63 @@ Possible values:
Default value: 16.
## low_cardinality_max_dictionary_size {#low_cardinality_max_dictionary_size}
Sets a maximum size in rows of a shared global dictionary for the [LowCardinality](../../sql-reference/data-types/lowcardinality.md) data type that can be written to a storage file system. This setting prevents issues with RAM in case of unlimited dictionary growth. All the data that can't be encoded due to maximum dictionary size limitation ClickHouse writes in an ordinary method.
Possible values:
- Any positive integer.
Default value: 8192.
## low_cardinality_use_single_dictionary_for_part {#low_cardinality_use_single_dictionary_for_part}
Turns on or turns off using of single dictionary for the data part.
By default, ClickHouse server monitors the size of dictionaries and if a dictionary overflows then the server starts to write the next one. To prohibit creating several dictionaries set `low_cardinality_use_single_dictionary_for_part = 1`.
Possible values:
- 1 — Creating several dictionaries for the data part is prohibited.
- 0 — Creating several dictionaries for the data part is not prohibited.
Default value: 0.
## low_cardinality_allow_in_native_format {#low_cardinality_allow_in_native_format}
Allows or restricts using the [LowCardinality](../../sql-reference/data-types/lowcardinality.md) data type with the [Native](../../interfaces/formats.md#native) format.
If usage of `LowCardinality` is restricted, ClickHouse server converts `LowCardinality`-columns to ordinary ones for `SELECT` queries, and convert ordinary columns to `LowCardinality`-columns for `INSERT` queries.
This setting is required mainly for third-party clients which don't support `LowCardinality` data type.
Possible values:
- 1 — Usage of `LowCardinality` is not restricted.
- 0 — Usage of `LowCardinality` is restricted.
Default value: 1.
## allow_suspicious_low_cardinality_types {#allow_suspicious_low_cardinality_types}
Allows or restricts using [LowCardinality](../../sql-reference/data-types/lowcardinality.md) with data types with fixed size of 8 bytes or less: numeric data types and `FixedString(8_bytes_or_less)`.
For small fixed values using of `LowCardinality` is usually inefficient, because ClickHouse stores a numeric index for each row. As a result:
- Disk space usage can rise.
- RAM consumption can be higher, depending on a dictionary size.
- Some functions can work slower due to extra coding/encoding operations.
Merge times in [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md)-engine tables can grow due to all the reasons described above.
Possible values:
- 1 — Usage of `LowCardinality` is not restricted.
- 0 — Usage of `LowCardinality` is restricted.
Default value: 0.
[Original article](https://clickhouse.tech/docs/en/operations/settings/settings/) <!-- hide -->
---
toc_priority: 52
toc_priority: 53
toc_title: AggregateFunction
---
......
---
toc_priority: 51
toc_priority: 52
toc_title: Array(T)
---
......
---
toc_priority: 51
toc_title: LowCardinality
---
# LowCardinality Data Type {#lowcardinality-data-type}
Changes the internal representation of other data types to be dictionary-encoded.
## Syntax {#lowcardinality-syntax}
```sql
LowCardinality(data_type)
```
**Parameters**
- `data_type`[String](string.md), [FixedString](fixedstring.md), [Date](date.md), [DateTime](datetime.md), and numbers excepting [Decimal](decimal.md). `LowCardinality` is not efficient for some data types, see the [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types) setting description.
## Description {#lowcardinality-dscr}
`LowCardinality` is a superstructure that changes a data storage method and rules of data processing. ClickHouse applies [dictionary coding](https://en.wikipedia.org/wiki/Dictionary_coder) to `LowCardinality`-columns. Operating with dictionary encoded data significantly increases performance of [SELECT](../statements/select/index.md) queries for many applications.
The efficiency of using `LowCarditality` data type depends on data diversity. If a dictionary contains less than 10,000 distinct values, then ClickHouse mostly shows higher efficiency of data reading and storing. If a dictionary contains more than 100,000 distinct values, then ClickHouse can perform worse in comparison with using ordinary data types.
Consider using `LowCardinality` instead of [Enum](enum.md) when working with strings. `LowCardinality` provides more flexibility in use and often reveals the same or higher efficiency.
## Example
Create a table with a `LowCardinality`-column:
```sql
CREATE TABLE lc_t
(
`id` UInt16,
`strings` LowCardinality(String)
)
ENGINE = MergeTree()
ORDER BY id
```
## Related Settings and Functions
Settings:
- [low_cardinality_max_dictionary_size](../../operations/settings/settings.md#low_cardinality_max_dictionary_size)
- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part)
- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format)
- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types)
Functions:
- [toLowCardinality](../functions/type-conversion-functions.md#tolowcardinality)
## See Also
- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality).
- [Reducing Clickhouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/).
- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf).
\ No newline at end of file
---
toc_priority: 54
toc_priority: 55
toc_title: Nullable
---
......
---
toc_priority: 53
toc_priority: 54
toc_title: Tuple(T1, T2, ...)
---
......
......@@ -516,7 +516,7 @@ Result:
**See Also**
- \[ISO 8601 announcement by @xkcd\](https://xkcd.com/1179/)
- [ISO 8601 announcement by @xkcd](https://xkcd.com/1179/)
- [RFC 1123](https://tools.ietf.org/html/rfc1123)
- [toDate](#todate)
- [toDateTime](#todatetime)
......@@ -529,4 +529,43 @@ Same as for [parseDateTimeBestEffort](#parsedatetimebesteffort) except that it r
Same as for [parseDateTimeBestEffort](#parsedatetimebesteffort) except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
## toLowCardinality {#tolowcardinality}
Converts input parameter to the [LowCardianlity](../data-types/lowcardinality.md) version of same data type.
To convert data from the `LowCardinality` data type use the [CAST](#type_conversion_function-cast) function. For example, `CAST(x as String)`.
**Syntax**
```sql
toLowCardinality(expr)
```
**Parameters**
- `expr`[Expression](../syntax.md#syntax-expressions) resulting in one of the [supported data types](../data-types/index.md#data_types).
**Returned values**
- Result of `expr`.
Type: `LowCardinality(expr_result_type)`
**Example**
Query:
```sql
SELECT toLowCardinality('1')
```
Result:
```text
┌─toLowCardinality('1')─┐
│ 1 │
└───────────────────────┘
```
[Original article](https://clickhouse.tech/docs/en/query_language/functions/type_conversion_functions/) <!--hide-->
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册