提交 b4ed48a4 编写于 作者: I Ivan Blinkov

Capitalize most words in titles of docs/en/

上级 7d5fb17c
# Boolean values
# Boolean Values
There isn't a separate type for boolean values. They use the UInt8 type, restricted to the values 0 or 1.
......@@ -3,7 +3,7 @@
Date with time. Stored in four bytes as a Unix timestamp (unsigned). Allows storing values in the same range as for the Date type. The minimal value is output as 0000-00-00 00:00:00.
The time is stored with accuracy up to one second (without leap seconds).
## Time zones
## Time Zones
The date with time is converted from text (divided into component parts) to binary and back, using the system's time zone at the time the client or server starts. In text format, information about daylight savings is lost.
......
......@@ -9,7 +9,7 @@ Types are equivalent to types of C:
We recommend that you store data in integer form whenever possible. For example, convert fixed precision numbers to integer values, such as monetary amounts or page load times in milliseconds.
## Using floating-point numbers
## Using Floating-point Numbers
- Computations with floating-point numbers might produce a rounding error.
......
<a name="data_types"></a>
# Data types
# Data Types
ClickHouse can store various types of data in table cells.
......
......@@ -2,14 +2,14 @@
Fixed-length integers, with or without a sign.
## Int ranges
## Int Ranges
- Int8 - [-128 : 127]
- Int16 - [-32768 : 32767]
- Int32 - [-2147483648 : 2147483647]
- Int64 - [-9223372036854775808 : 9223372036854775807]
## Uint ranges
## Uint Ranges
- UInt8 - [0 : 255]
- UInt16 - [0 : 65535]
......
# Special data types
# Special Data Types
Special data type values can't be saved to a table or output in results, but are used as the intermediate result of running a query.
# Overview of ClickHouse architecture
# Overview of ClickHouse Architecture
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns). Whenever possible, operations are dispatched on arrays, rather than on individual values. This is called "vectorized query execution," and it helps lower the cost of actual data processing.
......@@ -18,13 +18,13 @@ Nevertheless, it is possible to work with individual values as well. To represen
`Field` doesn't have enough information about a specific data type for a table. For example, `UInt8`, `UInt16`, `UInt32`, and `UInt64` are all represented as `UInt64` in a `Field`.
## Leaky abstractions
## Leaky Abstractions
`IColumn` has methods for common relational transformations of data, but they don't meet all needs. For example, `ColumnUInt64` doesn't have a method to calculate the sum of two columns, and `ColumnString` doesn't have a method to run a substring search. These countless routines are implemented outside of `IColumn`.
Various functions on columns can be implemented in a generic, non-efficient way using `IColumn` methods to extract `Field` values, or in a specialized way using knowledge of inner memory layout of data in a specific `IColumn` implementation. To do this, functions are cast to a specific `IColumn` type and deal with internal representation directly. For example, `ColumnUInt64` has the `getData` method that returns a reference to an internal array, then a separate routine reads or fills that array directly. In fact, we have "leaky abstractions" to allow efficient specializations of various routines.
## Data types
## Data Types
`IDataType` is responsible for serialization and deserialization: for reading and writing chunks of columns or individual values in binary or text form.
`IDataType` directly corresponds to data types in tables. For example, there are `DataTypeUInt32`, `DataTypeDateTime`, `DataTypeString` and so on.
......@@ -153,7 +153,7 @@ We maintain full backward and forward compatibility for the server TCP protocol:
> For all external applications, we recommend using the HTTP interface because it is simple and easy to use. The TCP protocol is more tightly linked to internal data structures: it uses an internal format for passing blocks of data and it uses custom framing for compressed data. We haven't released a C library for that protocol because it requires linking most of the ClickHouse codebase, which is not practical.
## Distributed query execution
## Distributed Query Execution
Servers in a cluster setup are mostly independent. You can create a `Distributed` table on one or all servers in a cluster. The `Distributed` table does not store data itself – it only provides a "view" to all local tables on multiple nodes of a cluster. When you SELECT from a `Distributed` table, it rewrites that query, chooses remote nodes according to load balancing settings, and sends the query to them. The `Distributed` table requests remote servers to process a query just up to a stage where intermediate results from different servers can be merged. Then it receives the intermediate results and merges them. The distributed table tries to distribute as much work as possible to remote servers, and does not send much intermediate data over the network.
......
# How to build ClickHouse release package
# How to Build ClickHouse Release Package
## Install Git and pbuilder
## Install Git and Pbuilder
```bash
sudo apt-get update
sudo apt-get install git pbuilder debhelper fakeroot
```
## Checkout ClickHouse sources
## Checkout ClickHouse Sources
```bash
git clone --recursive --branch stable https://github.com/yandex/ClickHouse.git
cd ClickHouse
```
## Run release script
## Run Release Script
```bash
pbuilder create
./release
```
# How to build ClickHouse for development
# How to Build ClickHouse for Development
Build should work on Ubuntu Linux.
With appropriate changes, it should also work on any other Linux distribution.
......@@ -46,7 +46,7 @@ Or cmake3 instead of cmake on older systems.
There are several ways to do this.
### Install from a PPA package
### Install from a PPA Package
```bash
sudo apt-get install software-properties-common
......@@ -55,24 +55,24 @@ sudo apt-get update
sudo apt-get install gcc-7 g++-7
```
### Install from sources
### Install from Sources
Look at [ci/build-gcc-from-sources.sh](https://github.com/yandex/ClickHouse/blob/master/ci/build-gcc-from-sources.sh)
## Use GCC 7 for builds
## Use GCC 7 for Builds
```bash
export CC=gcc-7
export CXX=g++-7
```
## Install required libraries from packages
## Install Required Libraries from Packages
```bash
sudo apt-get install libicu-dev libreadline-dev
```
## Checkout ClickHouse sources
## Checkout ClickHouse Sources
```bash
git clone --recursive git@github.com:yandex/ClickHouse.git
......
# How to build ClickHouse on Mac OS X
# How to Build ClickHouse on Mac OS X
Build should work on Mac OS X 10.12. If you're using earlier version, you can try to build ClickHouse using Gentoo Prefix and clang sl in this instruction.
With appropriate changes, it should also work on any other Linux distribution.
......@@ -9,13 +9,13 @@ With appropriate changes, it should also work on any other Linux distribution.
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```
## Install required compilers, tools, and libraries
## Install Required Compilers, Tools, and Libraries
```bash
brew install cmake ninja gcc icu4c mariadb-connector-c openssl libtool gettext readline
```
## Checkout ClickHouse sources
## Checkout ClickHouse Sources
```bash
git clone --recursive --depth=10 git@github.com:yandex/ClickHouse.git
......
# How to write C++ code
# How to Write C++ Code
## General recommendations
## General Recommendations
**1.** The following are recommendations, not requirements.
......@@ -415,7 +415,7 @@ You can also use an abbreviation if the full name is included next to it in the
**17.** File names with C++ source code must have the `.cpp` extension. Header files must have the `.h` extension.
## How to write code
## How to Write Code
**1.** Memory management.
......@@ -680,7 +680,7 @@ std::string s{"Hello"};
**26.** For virtual functions, write `virtual` in the base class, but write `override` in descendent classes.
## Unused features of C++
## Unused Features of C++
**1.** Virtual inheritance is not used.
......@@ -754,7 +754,7 @@ If there is a good solution already available, then use it, even if it means you
**5.** Preference is always given to libraries that are already used.
## General recommendations
## General Recommendations
**1.** Write as little code as possible.
......@@ -768,7 +768,7 @@ If there is a good solution already available, then use it, even if it means you
**6.** Code simplification is encouraged. Reduce the size of your code where possible.
## Additional recommendations
## Additional Recommendations
**1.** Explicit `std::` for types from `stddef.h` is not recommended.
......
# General questions
# General Questions
## Why not use something like MapReduce?
## Why Not Use Something Like MapReduce?
We can refer to systems like MapReduce as distributed computing systems in which the reduce operation is based on a distributed sort. The most common opensource solution of this kind is [Apache Hadoop](http://hadoop.apache.org), while Yandex internally uses it's own MapReduce implementation — YT.
......
# Terabyte of click logs from Criteo
# Terabyte of Click Logs from Criteo
Download the data from <http://labs.criteo.com/downloads/download-terabyte-click-logs/>
......
# New York Taxi data
# New York Taxi Data
## How to import the raw data
## How to Import The Raw Data
See <https://github.com/toddwschneider/nyc-taxi-data> and <http://tech.marksblogg.com/billion-nyc-taxi-rides-redshift.html> for the description of the dataset and instructions for downloading.
......@@ -26,7 +26,7 @@ You can check the number of downloaded rows as follows:
```text
time psql nyc-taxi-data -c "SELECT count(*) FROM trips;"
## count
## Count
1298979494
(1 row)
......@@ -272,7 +272,7 @@ WHERE (table = 'trips_mergetree') AND active
Among other things, you can run the OPTIMIZE query on MergeTree. But it's not required, since everything will be fine without it.
## Results on single server
## Results on Single Server
Q1:
......
# Getting started
# Getting Started
## System requirements
## System Requirements
This is not a cross-platform system. It requires Linux Ubuntu Precise (12.04) or newer, with x86_64 architecture and support for the SSE 4.2 instruction set.
To check for SSE 4.2:
......@@ -16,7 +16,7 @@ The terminal must use UTF-8 encoding (the default in Ubuntu).
For testing and development, the system can be installed on a single server or on a desktop computer.
### Installing from packages for Debian/Ubuntu
### Installing from Packages for Debian/Ubuntu
In `/etc/apt/sources.list` (or in a separate `/etc/apt/sources.list.d/clickhouse.list` file), add the repository:
......@@ -40,7 +40,7 @@ ClickHouse contains access restriction settings. They are located in the 'users.
By default, access is allowed from anywhere for the 'default' user, without a password. See 'user/default/networks'.
For more information, see the section "Configuration files".
### Installing from sources
### Installing from Sources
To compile, follow the instructions: build.md
......@@ -64,7 +64,7 @@ Run 'chown' for the desired user.
Note the path to logs in the server config (src/dbms/programs/config.xml).
### Other installation methods
### Other Installation Methods
Docker image: <https://hub.docker.com/r/yandex/clickhouse-server/>
......
......@@ -38,7 +38,7 @@ Different orders for storing data are better suited to different scenarios. The
The higher the load on the system, the more important it is to customize the system set up to match the requirements of the usage scenario, and the more fine grained this customization becomes. There is no system that is equally well-suited to significantly different scenarios. If a system is adaptable to a wide set of scenarios, under a high load, the system will handle all the scenarios equally poorly, or will work well for just one or few of possible scenarios.
## Key properties of OLAP scenario
## Key Properties of OLAP Scenario
- The vast majority of requests are for read access.
- Data is ingested in fairly large batches (> 1000 rows), not by single rows; or it is not updated at all.
......@@ -56,7 +56,7 @@ The higher the load on the system, the more important it is to customize the sys
It is easy to see that the OLAP scenario is very different from other popular scenarios (such as OLTP or Key-Value access). So it doesn't make sense to try to use OLTP or a Key-Value DB for processing analytical queries if you want to get decent performance. For example, if you try to use MongoDB or Redis for analytics, you will get very poor performance compared to OLAP databases.
## Reasons why columnar databases are better suited for OLAP scenario
## Reasons Why Columnar Databases Are Better Suited for OLAP Scenario
Column-oriented databases are better suited to OLAP scenarios (at least 100 times better in processing speed for most queries). The reasons for that are explained below in detail, but it's easier to be demonstrated visually:
......
# Command-line client
# Command-line Client
To work from the command line, you can use ` clickhouse-client`:
......@@ -78,7 +78,7 @@ You can pass parameters to `clickhouse-client` (all parameters have a default va
Settings in the configuration files override the default values.
### Command line options
### Command Line Options
- `--host, -h` -– The server name, 'localhost' by default. You can use either the name or the IPv4 or IPv6 address.
- `--port` – The port to connect to. Default value: 9000. Note that the HTTP interface and the native interface use different ports.
......@@ -94,7 +94,7 @@ You can pass parameters to `clickhouse-client` (all parameters have a default va
- `--stacktrace` – If specified, also print the stack trace if an exception occurs.
- `-config-file` – The name of the configuration file.
### Configuration files
### Configuration Files
`clickhouse-client` uses the first existing file of the following:
......
<a name="formats"></a>
# Input and output formats
# Input and Output Formats
The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).
......
# HTTP interface
# HTTP Interface
The HTTP interface lets you use ClickHouse on any platform from any programming language. We use it for working from Java and Perl, as well as shell scripts. In other departments, the HTTP interface is used from Perl, Python, and Go. The HTTP interface is more limited than the native interface, but it has better compatibility.
......@@ -202,7 +202,7 @@ The optional 'quota_key' parameter can be passed as the quota key (any string).
The HTTP interface allows passing external data (external temporary tables) for querying. For more information, see the section "External data for query processing".
## Response buffering
## Response Buffering
You can enable response buffering on the server side. The `buffer_size` and `wait_end_of_query` URL parameters are provided for this purpose.
......
# JDBC driver
# JDBC Driver
There is an official JDBC driver for ClickHouse. See [here](https://github.com/yandex/clickhouse-jdbc) .
......
# Native interface (TCP)
# Native Interface (TCP)
The native interface is used in the "clickhouse-client" command-line client for interaction between servers with distributed query processing, and also in C++ programs. We will only cover the command-line client.
# Libraries from third-party developers
# Libraries from Third-party Developers
There are libraries for working with ClickHouse for:
......
# Visual interfaces from third-party developers
# Visual Interfaces from Third-party Developers
## Tabix
......
# Distinctive features of ClickHouse
# Distinctive Features of ClickHouse
## True column-oriented DBMS
## True Column-oriented DBMS
In a true column-oriented DBMS, there is no excessive data stored with the values. For example, this means that constant-length values must be supported, to avoid storing their length as additional integer next to the values. In this case, a billion UInt8 values should actually consume around 1 GB uncompressed, or this will strongly affect the CPU use. It is very important to store data compactly even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
......@@ -8,24 +8,24 @@ This is worth noting because there are systems that can store values of differen
Also note that ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
## Data compression
## Data Compression
Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression. However, data compression is crucial to achieve excellent performance.
## Disk storage of data
## Disk Storage of Data
Many column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach stimulates the allocation of a larger hardware budget than is actually necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which ensures low cost of ownership per gigabyte of data, but SSD and additional RAM are also utilized fully if available.
## Parallel processing on multiple cores
## Parallel Processing on Multiple Cores
Large queries are parallelized in a natural way, utilizing all necessary resources that are available on the current server.
## Distributed processing on multiple servers
## Distributed Processing on Multiple Servers
Almost none of the columnar DBMSs mentioned above have support for distributed query processing.
In ClickHouse, data can reside on different shards. Each shard can be a group of replicas that are used for fault tolerance. The query is processed on all the shards in parallel. This is transparent for the user.
## SQL support
## SQL Support
If you are familiar with standard SQL, we can't really talk about SQL support.
All the functions have different names.
......@@ -37,11 +37,11 @@ ClickHouse supports declarative query language that is based on SQL and complies
GROUP BY, ORDER BY, scalar subqueries and subqueries in FROM, IN and JOIN clauses are supported.
Correlated subqueries and window functions are not supported.
## Vector engine
## Vector Engine
Data is not only stored by columns, but is also processed by vectors (parts of columns). This allows to achieve high CPU efficiency.
## Real-time data updates
## Real-time Data Updates
ClickHouse supports tables with a primary key. In order to quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. No locks are taken when new data is ingested.
......@@ -49,11 +49,11 @@ ClickHouse supports tables with a primary key. In order to quickly perform queri
Having a data physically sorted by primary key makes it possible to extract data for it's specific values or value ranges with low latency, less than few dozen milliseconds.
## Suitable for online queries
## Suitable for Online Queries
Low latency means that queries can be processed without delay and without trying to prepare answer in advance, right at the same moment while user interface page is loading. In other words, online.
## Support for approximated calculations
## Support for Approximated Calculations
ClickHouse provides various ways to trade accuracy for performance:
......@@ -61,7 +61,7 @@ ClickHouse provides various ways to trade accuracy for performance:
2. Running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
## Data replication and integrity
## Data Replication and Integrity
ClickHouse uses asynchronous multimaster replication. After being written to any available replica, data is distributed to all the other replicas in background. The system maintains identical data on different replicas. Data is restored automatically after most failures, or semiautomatically in complicated cases.
......
# ClickHouse features that can be considered disadvantages
# ClickHouse Features that Can be Considered Disadvantages
1. No full-fledged transactions.
2. Lack of ability to modify or delete already inserted data with high rate and low latency. There are batch deletes available to clean up data that is not needed anymore or to comply with [GDPR](https://gdpr-info.eu). Batch updates are currently in development as of July 2018.
......
......@@ -4,21 +4,21 @@ According to internal testing results by Yandex, ClickHouse shows the best perfo
There are a lot of independent benchmarks that confirm this as well. You can look it up on your own or here is the small [collection of independent benchmark links](https://clickhouse.yandex/#independent-benchmarks).
## Throughput for a single large query
## Throughput for a Single Large Query
Throughput can be measured in rows per second or in megabytes per second. If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of approximately 2-10 GB/s of uncompressed data on a single server (for the simplest cases, the speed may reach 30 GB/s). If data is not placed in the page cache, the speed is bound by the disk subsystem and how well the data has been compressed. For example, if the disk subsystem allows reading data at 400 MB/s, and the data compression rate is 3, the speed will be around 1.2 GB/s. To get the speed in rows per second, divide the speed in bytes per second by the total size of the columns used in the query. For example, if 10 bytes of columns are extracted, the speed will be around 100-200 million rows per second.
The processing speed increases almost linearly for distributed processing, but only if the number of rows resulting from aggregation or sorting is not too large.
## Latency when processing short queries
## Latency When Processing Short Queries
If a query uses a primary key and does not select too many rows to process (hundreds of thousands), and does not use too many columns, we can expect less than 50 milliseconds of latency (single digits of milliseconds in the best case) if data is placed in the page cache. Otherwise, latency is calculated from the number of seeks. If you use rotating drives, for a system that is not overloaded, the approximate latency can be calculated by this formula: seek time (10 ms) \* number of columns queried \* number of data parts.
## Throughput when processing a large quantity of short queries
## Throughput When Processing a Large Quantity of Short Queries
Under the same circumstances, ClickHouse can handle several hundred queries per second on a single server (up to several thousands in the best case). Since this scenario is not typical for analytical DBMSs, it is better to expect a maximum of hundreds of queries per second.
## Performance when inserting data
## Performance When Inserting Data
It is recommended to insert data in batches of at least 1000 rows, or no more than a single request per second. When inserting to a MergeTree table from a tab-separated dump, the insertion speed will be from 50 to 200 MB/s. If the inserted rows are around 1 Kb in size, the speed will be from 50,000 to 200,000 rows per second. If the rows are small, the performance will be higher in rows per second (on Banner System data -`>` 500,000 rows per second; on Graphite data -`>` 1,000,000 rows per second). To improve performance, you can make multiple INSERT queries in parallel, and performance will increase linearly.
# Yandex.Metrica use case
# Yandex.Metrica Use Case
ClickHouse has been initially developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be it's core component. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article gives a historical background on what was the main goal of ClickHouse before it became an opensource product.
......@@ -6,7 +6,7 @@ Yandex.Metrica generates custom reports based on hits and sessions on the fly, w
As of April 2014, Yandex.Metrica received approximately 12 billion events (page views and clicks) daily. All these events must be stored in order to build those custom reports. A single query may require scanning millions of rows in no more than a few hundred milliseconds, or hundreds of millions of rows over a few seconds.
## Usage in Yandex.Metrica and other Yandex services
## Usage in Yandex.Metrica and Other Yandex Services
ClickHouse is used for multiple purposes in Yandex.Metrica.
Its main task is to build reports in online mode using non-aggregated data. It uses a cluster of 374 servers, which store over 20.3 trillion rows in the database. The volume of compressed data, without counting duplication and replication, is about 2 PB. The volume of uncompressed data (in TSV format) would be approximately 17 PB.
......@@ -21,7 +21,7 @@ ClickHouse is also used for:
ClickHouse has at least a dozen installations in other Yandex services: in search verticals, Market, Direct, business analytics, mobile development, AdFox, personal services, and others.
## Aggregated and non-aggregated data
## Aggregated and Non-aggregated Data
There is a popular opinion that in order to effectively calculate statistics, you must aggregate data, since this reduces the volume of data.
......
# Access rights
# Access Rights
Users and access rights are set up in the user config. This is usually `users.xml`.
......
<a name="configuration_files"></a>
# Configuration files
# Configuration Files
The main server config file is `config.xml`. It resides in the `/etc/clickhouse-server/` directory.
......
<a name="table_engines-custom_partitioning_key"></a>
# Custom partitioning key
# Custom Partitioning Key
Starting with version 1.1.54310, you can create tables in the MergeTree family with any partitioning expression (not only partitioning by month).
......
# External data for query processing
# External Data for Query Processing
ClickHouse allows sending a server the data that is needed for processing a query, together with a SELECT query. This data is put in a temporary table (see the section "Temporary tables") and can be used in the query (for example, in IN operators).
......
......@@ -10,7 +10,7 @@ Usage examples:
- Convert data from one format to another.
- Updating data in ClickHouse via editing a file on a disk.
## Usage in ClickHouse server
## Usage in ClickHouse Server
```
File(Format)
......@@ -58,7 +58,7 @@ SELECT * FROM file_engine_table
└──────┴───────┘
```
## Usage in clickhouse-local
## Usage in Clickhouse-local
In [clickhouse-local](../utils/clickhouse-local.md#utils-clickhouse-local) File engine accepts file path in addition to `Format`. Default input/output streams can be specified using numeric or human-readable names like `0` or `stdin`, `1` or `stdout`.
......@@ -68,7 +68,7 @@ In [clickhouse-local](../utils/clickhouse-local.md#utils-clickhouse-local) File
$ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64) ENGINE = File(CSV, stdin); SELECT a, b FROM table; DROP TABLE table"
```
## Details of implementation
## Details of Implementation
- Reads can be parallel, but not writes
- Not supported:
......
......@@ -16,7 +16,7 @@ Graphite stores full data in ClickHouse, and data can be retrieved in the follow
The engine inherits properties from MergeTree. The settings for thinning data are defined by the [graphite_rollup](../server_settings/settings.md#server_settings-graphite_rollup) parameter in the server configuration.
## Using the engine
## Using The Engine
The Graphite data table must contain the following fields at minimum:
......
# Table engines
# Table Engines
The table engine (type of table) determines:
......
......@@ -22,7 +22,7 @@ It is possible to create two Merge tables that will endlessly try to read each o
The typical way to use the Merge engine is for working with a large number of TinyLog tables as if with a single table.
## Virtual columns
## Virtual Columns
Virtual columns are columns that are provided by the table engine, regardless of the table definition. In other words, these columns are not specified in CREATE TABLE, but they are accessible for SELECT.
......
<a name="table_engines-replication"></a>
# Data replication
# Data Replication
Replication is only supported for tables in the MergeTree family:
......@@ -70,7 +70,7 @@ The system monitors data synchronicity on replicas and is able to recover after
<a name="table_engines-replication-creation_of_rep_tables"></a>
## Creating replicated tables
## Creating Replicated Tables
The `Replicated` prefix is added to the table engine name. For example:`ReplicatedMergeTree`.
......@@ -113,7 +113,7 @@ If you add a new replica after the table already contains some data on other rep
To delete a replica, run `DROP TABLE`. However, only one replica is deleted – the one that resides on the server where you run the query.
## Recovery after failures
## Recovery After Failures
If ZooKeeper is unavailable when a server starts, replicated tables switch to read-only mode. The system periodically attempts to connect to ZooKeeper.
......@@ -137,7 +137,7 @@ sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data
Then restart the server. On start, the server deletes these flags and starts recovery.
## Recovery after complete data loss
## Recovery After Complete Data Loss
If all data and metadata disappeared from one of the servers, follow these steps for recovery:
......@@ -175,7 +175,7 @@ If you want to get rid of a `ReplicatedMergeTree` table without launching the se
After this, you can launch the server, create a `MergeTree` table, move the data to its directory, and then restart the server.
## Recovery when metadata in the ZooKeeper cluster is lost or damaged
## Recovery When Metadata in The ZooKeeper Cluster is Lost or Damaged
If the data in ZooKeeper was lost or damaged, you can save data by moving it to an unreplicated table as described above.
......
......@@ -5,7 +5,7 @@
This data source operates with data on remote HTTP/HTTPS server. The engine is
similar to [`File`](./file.md#).
## Usage in ClickHouse server
## Usage in ClickHouse Server
```
URL(URL, Format)
......@@ -67,7 +67,7 @@ SELECT * FROM url_engine_table
```
## Details of implementation
## Details of Implementation
- Reads and writes can be parallel
- Not supported:
......
# Usage recommendations
# Usage Recommendations
## CPU
......@@ -16,7 +16,7 @@ Don't disable hyper-threading. It helps for some queries, but not for others.
Turbo Boost is highly recommended. It significantly improves performance with a typical load.
You can use `turbostat` to view the CPU's actual clock rate under a load.
## CPU scaling governor
## CPU Scaling Governor
Always use the `performance` scaling governor. The `on-demand` scaling governor works much worse with constantly high demand.
......@@ -24,7 +24,7 @@ Always use the `performance` scaling governor. The `on-demand` scaling governor
sudo echo 'performance' | tee /sys/devices/system/cpu/cpu\*/cpufreq/scaling_governor
```
## CPU limitations
## CPU Limitations
Processors can overheat. Use `dmesg` to see if the CPU's clock rate was limited due to overheating.
The restriction can also be set externally at the datacenter level. You can use `turbostat` to monitor it under a load.
......@@ -35,11 +35,11 @@ For small amounts of data (up to \~200 GB compressed), it is best to use as much
For large amounts of data and when processing interactive (online) queries, you should use a reasonable amount of RAM (128 GB or more) so the hot data subset will fit in the cache of pages.
Even for data volumes of \~50 TB per server, using 128 GB of RAM significantly improves query performance compared to 64 GB.
## Swap file
## Swap File
Always disable the swap file. The only reason for not doing this is if you are using ClickHouse on your personal laptop.
## Huge pages
## Huge Pages
Always disable transparent huge pages. It interferes with memory allocators, which leads to significant performance degradation.
......@@ -50,7 +50,7 @@ echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
Use `perf top` to watch the time spent in the kernel for memory management.
Permanent huge pages also do not need to be allocated.
## Storage subsystem
## Storage Subsystem
If your budget allows you to use SSD, use SSD.
If not, use HDD. SATA HDDs 7200 RPM will do.
......@@ -83,13 +83,13 @@ Regardless of RAID use, always use replication for data security.
Enable NCQ with a long queue. For HDD, choose the CFQ scheduler, and for SSD, choose noop. Don't reduce the 'readahead' setting.
For HDD, enable the write cache.
## File system
## File System
Ext4 is the most reliable option. Set the mount options `noatime, nobarrier`.
XFS is also suitable, but it hasn't been as thoroughly tested with ClickHouse.
Most other file systems should also work fine. File systems with delayed allocation work better.
## Linux kernel
## Linux Kernel
Don't use an outdated Linux kernel. In 2015, 3.18.19 was new enough.
Consider using the kernel build from Yandex:<https://github.com/yandex/smart> – it provides at least a 5% performance increase.
......
# ClickHouse utility
# ClickHouse Utility
* [clickhouse-local](clickhouse-local.md#utils-clickhouse-local) — Allows running SQL queries on data without stopping the ClickHouse server, similar to how `awk` does this.
* [clickhouse-copier](clickhouse-copier.md#utils-clickhouse-copier) — Copies (and reshards) data from one cluster to another cluster.
......
......@@ -4,7 +4,7 @@
The `ALTER` query is only supported for `*MergeTree` tables, as well as `Merge`and`Distributed`. The query has several variations.
### Column manipulations
### Column Manipulations
Changing the table structure.
......@@ -68,7 +68,7 @@ For tables that don't store data themselves (such as `Merge` and `Distributed`),
The `ALTER` query for changing columns is replicated. The instructions are saved in ZooKeeper, then each replica applies them. All `ALTER` queries are run in the same order. The query waits for the appropriate actions to be completed on the other replicas. However, a query to change columns in a replicated table can be interrupted, and all actions will be performed asynchronously.
### Manipulations with partitions and parts
### Manipulations With Partitions and Parts
It only works for tables in the `MergeTree` family. The following operations are available:
......@@ -187,7 +187,7 @@ To restore from a backup:
In this way, data from the backup will be added to the table.
Restoring from a backup doesn't require stopping the server.
### Backups and replication
### Backups and Replication
Replication provides protection from device failures. If all data disappeared on one of your replicas, follow the instructions in the "Restoration after failure" section to restore it.
......@@ -213,7 +213,7 @@ Before downloading, the system checks that the partition exists and the table st
The `ALTER ... FETCH PARTITION` query is not replicated. The partition will be downloaded to the 'detached' directory only on the local server. Note that if after this you use the `ALTER TABLE ... ATTACH` query to add data to the table, the data will be added on all replicas (on one of the replicas it will be added from the 'detached' directory, and on the rest it will be loaded from neighboring replicas).
### Synchronicity of ALTER queries
### Synchronicity of ALTER Queries
For non-replicatable tables, all `ALTER` queries are performed synchronously. For replicatable tables, the query just adds instructions for the appropriate actions to `ZooKeeper`, and the actions themselves are performed as soon as possible. However, the query can wait for these actions to be completed on all the replicas.
......@@ -248,7 +248,7 @@ A mutation query returns immediately after the mutation entry is added (in case
Entries for finished mutations are not deleted right away (the number of preserved entries is determined by the `finished_mutations_to_keep` storage engine parameter). Older mutation entries are deleted.
#### system.mutations table
#### system.mutations Table
The table contains information about mutations of MergeTree tables and their progress. Each mutation command is represented by a single row. The table has the following columns:
......
......@@ -44,7 +44,7 @@ Creates a table with a structure like the result of the `SELECT` query, with the
In all cases, if `IF NOT EXISTS` is specified, the query won't return an error if the table already exists. In this case, the query won't do anything.
### Default values
### Default Values
The column description can specify an expression for a default value, in one of the following ways:`DEFAULT expr`, `MATERIALIZED expr`, `ALIAS expr`.
Example: `URLDomain String DEFAULT domain(URL)`.
......@@ -79,7 +79,7 @@ If you add a new column to a table but later change its default expression, the
It is not possible to set default values for elements in nested data structures.
### Temporary tables
### Temporary Tables
In all cases, if `TEMPORARY` is specified, a temporary table will be created. Temporary tables have the following characteristics:
......
<a name="dicts-external_dicts"></a>
# External dictionaries
# External Dictionaries
You can add your own dictionaries from various data sources. The data source for a dictionary can be a local text or executable file, an HTTP(s) resource, or another DBMS. For more information, see "[Sources for external dictionaries](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources)".
......
<a name="dicts-external_dicts_dict"></a>
# Configuring an external dictionary
# Configuring an External Dictionary
The dictionary configuration has the following structure:
......
<a name="dicts-external_dicts_dict_layout"></a>
# Storing dictionaries in memory
# Storing Dictionaries in Memory
There are a [variety of ways](#dicts-external_dicts_dict_layout-manner) to store dictionaries in memory.
......@@ -38,7 +38,7 @@ The configuration looks like this:
<a name="dicts-external_dicts_dict_layout-manner"></a>
## Ways to store dictionaries in memory
## Ways to Store Dictionaries in Memory
- [flat](#dicts-external_dicts_dict_layout-flat)
- [hashed](#dicts-external_dicts_dict_layout-hashed)
......
<a name="dicts-external_dicts_dict_lifetime"></a>
# Dictionary updates
# Dictionary Updates
ClickHouse periodically updates the dictionaries. The update interval for fully downloaded dictionaries and the invalidation interval for cached dictionaries are defined in the `<lifetime>` tag in seconds.
......
<a name="dicts-external_dicts_dict_sources"></a>
# Sources of external dictionaries
# Sources of External Dictionaries
An external dictionary can be connected from many different sources.
......@@ -36,7 +36,7 @@ Types of sources (`source_type`):
<a name="dicts-external_dicts_dict_sources-local_file"></a>
## Local file
## Local File
Example of settings:
......@@ -56,7 +56,7 @@ Setting fields:
<a name="dicts-external_dicts_dict_sources-executable"></a>
## Executable file
## Executable File
Working with executable files depends on [how the dictionary is stored in memory](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout). If the dictionary is stored using `cache` and `complex_key_cache`, ClickHouse requests the necessary keys by sending a request to the executable file's `STDIN`.
......@@ -124,7 +124,7 @@ Setting fields:
- `connection_string` – Connection string.
- `invalidate_query` – Query for checking the dictionary status. Optional parameter. Read more in the section [Updating dictionaries](external_dicts_dict_lifetime.md#dicts-external_dicts_dict_lifetime).
## Example of connecting PostgreSQL
## Example of Connecting PostgreSQL
Ubuntu OS.
......@@ -190,7 +190,7 @@ The dictionary configuration in ClickHouse:
You may need to edit `odbc.ini` to specify the full path to the library with the driver `DRIVER=/usr/local/lib/psqlodbcw.so`.
### Example of connecting MS SQL Server
### Example of Connecting MS SQL Server
Ubuntu OS.
......
<a name="dicts-external_dicts_dict_structure"></a>
# Dictionary key and fields
# Dictionary Key and Fields
The `<structure>` clause describes the dictionary key and fields available for queries.
......@@ -42,7 +42,7 @@ A structure can contain either `<id>` or `<key>` .
!!! warning
The key doesn't need to be defined separately in attributes.
### Numeric key
### Numeric Key
Format: `UInt64`.
......@@ -58,7 +58,7 @@ Configuration fields:
- name – The name of the column with keys.
### Composite key
### Composite Key
The key can be a `tuple` from any types of fields. The [layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) in this case must be `complex_key_hashed` or `complex_key_cache`.
......
# Internal dictionaries
# Internal Dictionaries
ClickHouse contains a built-in feature for working with a geobase.
......
# SQL reference
# SQL Reference
* [SELECT](select.md#select)
* [INSERT INTO](insert_into.md#queries-insert)
......
......@@ -41,7 +41,7 @@ INSERT INTO t FORMAT TabSeparated
You can insert data separately from the query by using the command-line client or the HTTP interface. For more information, see the section "[Interfaces](../interfaces/index.md#interfaces)".
### Inserting the results of `SELECT`
### Inserting The Results of `SELECT`
```sql
INSERT INTO [db.]table [(c1, c2, c3)] SELECT ...
......@@ -54,7 +54,7 @@ None of the data formats except Values allow setting values to expressions such
Other queries for modifying data parts are not supported: `UPDATE`, `DELETE`, `REPLACE`, `MERGE`, `UPSERT`, `INSERT UPDATE`.
However, you can delete old data using `ALTER TABLE ... DROP PARTITION`.
### Performance considerations
### Performance Considerations
`INSERT` sorts the input data by primary key and splits them into partitions by month. If you insert data for mixed months, it can significantly reduce the performance of the `INSERT` query. To avoid this:
......
# Miscellaneous queries
# Miscellaneous Queries
## ATTACH
......
......@@ -3,17 +3,17 @@
All operators are transformed to the corresponding functions at the query parsing stage, in accordance with their precedence and associativity.
Groups of operators are listed in order of priority (the higher it is in the list, the earlier the operator is connected to its arguments).
## Access operators
## Access Operators
`a[N]` Access to an element of an array; ` arrayElement(a, N) function`.
`a.N` – Access to a tuble element; `tupleElement(a, N)` function.
## Numeric negation operator
## Numeric Negation Operator
`-a` – The `negate (a)` function.
## Multiplication and division operators
## Multiplication and Division Operators
`a * b` – The `multiply (a, b) function.`
......@@ -21,13 +21,13 @@ Groups of operators are listed in order of priority (the higher it is in the lis
`a % b` – The `modulo(a, b) function.`
## Addition and subtraction operators
## Addition and Subtraction Operators
`a + b` – The `plus(a, b) function.`
`a - b` – The `minus(a, b) function.`
## Comparison operators
## Comparison Operators
`a = b` – The `equals(a, b) function.`
......@@ -51,7 +51,7 @@ Groups of operators are listed in order of priority (the higher it is in the lis
`a BETWEEN b AND c` – The same as `a >= b AND a <= c.`
## Operators for working with data sets
## Operators for Working With Data Sets
*See the section "IN operators".*
......@@ -63,19 +63,19 @@ Groups of operators are listed in order of priority (the higher it is in the lis
`a GLOBAL NOT IN ...` – The `globalNotIn(a, b) function.`
## Logical negation operator
## Logical Negation Operator
`NOT a` The `not(a) function.`
## Logical AND operator
## Logical AND Operator
`a AND b` – The`and(a, b) function.`
## Logical OR operator
## Logical OR Operator
`a OR b` – The `or(a, b) function.`
## Conditional operator
## Conditional Operator
`a ? b : c` – The `if(a, b, c) function.`
......@@ -83,7 +83,7 @@ Note:
The conditional operator calculates the values of b and c, then checks whether condition a is met, and then returns the corresponding value. If "b" or "c" is an arrayJoin() function, each row will be replicated regardless of the "a" condition.
## Conditional expression
## Conditional Expression
```sql
CASE [x]
......@@ -95,21 +95,21 @@ END
If "x" is specified, then transform(x, \[a, ...\], \[b, ...\], c). Otherwise – multiIf(a, b, ..., c).
## Concatenation operator
## Concatenation Operator
`s1 || s2` – The `concat(s1, s2) function.`
## Lambda creation operator
## Lambda Creation Operator
`x -> expr` – The `lambda(x, expr) function.`
The following operators do not have a priority, since they are brackets:
## Array creation operator
## Array Creation Operator
`[x1, ...]` – The `array(x1, ...) function.`
## Tuple creation operator
## Tuple Creation Operator
`(x1, x2, ...)` – The `tuple(x2, x2, ...) function.`
......
# SELECT queries syntax
# SELECT Queries Syntax
`SELECT` performs data retrieval.
......@@ -26,7 +26,7 @@ The clauses below are described in almost the same order as in the query executi
If the query omits the `DISTINCT`, `GROUP BY` and `ORDER BY` clauses and the `IN` and `JOIN` subqueries, the query will be completely stream processed, using O(1) amount of RAM.
Otherwise, the query might consume a lot of RAM if the appropriate restrictions are not specified: `max_memory_usage`, `max_rows_to_group_by`, `max_rows_to_sort`, `max_rows_in_distinct`, `max_bytes_in_distinct`, `max_rows_in_set`, `max_bytes_in_set`, `max_rows_in_join`, `max_bytes_in_join`, `max_bytes_before_external_sort`, `max_bytes_before_external_group_by`. For more information, see the section "Settings". It is possible to use external sorting (saving temporary tables to a disk) and external aggregation. `The system does not have "merge join"`.
### FROM clause
### FROM Clause
If the FROM clause is omitted, data will be read from the `system.one` table.
The 'system.one' table contains exactly one row (this table fulfills the same purpose as the DUAL table found in other DBMSs).
......@@ -44,7 +44,7 @@ If a query does not list any columns (for example, SELECT count() FROM t), some
The FINAL modifier can be used only for a SELECT from a CollapsingMergeTree table. When you specify FINAL, data is selected fully "collapsed". Keep in mind that using FINAL leads to a selection that includes columns related to the primary key, in addition to the columns specified in the SELECT. Additionally, the query will be executed in a single stream, and data will be merged during query execution. This means that when using FINAL, the query is processed more slowly. In most cases, you should avoid using FINAL. For more information, see the section "CollapsingMergeTree engine".
### SAMPLE clause
### SAMPLE Clause
The SAMPLE clause allows for approximated query processing. Approximated query processing is only supported by MergeTree\* type tables, and only if the sampling expression was specified during table creation (see the section "MergeTree engine").
......@@ -80,7 +80,7 @@ A sample with a relative coefficient is "consistent": if we look at all possible
For example, a sample of user IDs takes rows with the same subset of all the possible user IDs from different tables. This allows using the sample in subqueries in the IN clause, as well as for manually correlating results of different queries with samples.
### ARRAY JOIN clause
### ARRAY JOIN Clause
Allows executing JOIN with an array or nested data structure. The intent is similar to the 'arrayJoin' function, but its functionality is broader.
......@@ -332,7 +332,7 @@ The query can only specify a single ARRAY JOIN clause.
The corresponding conversion can be performed before the WHERE/PREWHERE clause (if its result is needed in this clause), or after completing WHERE/PREWHERE (to reduce the volume of calculations).
### JOIN clause
### JOIN Clause
The normal JOIN, which is not related to ARRAY JOIN described above.
......@@ -426,14 +426,14 @@ Among the various types of JOINs, the most efficient is ANY LEFT JOIN, then ANY
If you need a JOIN for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a JOIN might not be very convenient due to the bulky syntax and the fact that the right table is re-accessed for every query. For such cases, there is an "external dictionaries" feature that you should use instead of JOIN. For more information, see the section "External dictionaries".
### WHERE clause
### WHERE Clause
If there is a WHERE clause, it must contain an expression with the UInt8 type. This is usually an expression with comparison and logical operators.
This expression will be used for filtering data before all other transformations.
If indexes are supported by the database table engine, the expression is evaluated on the ability to use indexes.
### PREWHERE clause
### PREWHERE Clause
This clause has the same meaning as the WHERE clause. The difference is in which data is read from the table.
When using PREWHERE, first only the columns necessary for executing PREWHERE are read. Then the other columns are read that are needed for running the query, but only those blocks where the PREWHERE expression is true.
......@@ -450,7 +450,7 @@ Keep in mind that it does not make much sense for PREWHERE to only specify those
If the 'optimize_move_to_prewhere' setting is set to 1 and PREWHERE is omitted, the system uses heuristics to automatically move parts of expressions from WHERE to PREWHERE.
### GROUP BY clause
### GROUP BY Clause
This is one of the most important parts of a column-oriented DBMS.
......@@ -490,7 +490,7 @@ GROUP BY is not supported for array columns.
A constant can't be specified as arguments for aggregate functions. Example: sum(1). Instead of this, you can get rid of the constant. Example: `count()`.
#### WITH TOTALS modifier
#### WITH TOTALS Modifier
If the WITH TOTALS modifier is specified, another row will be calculated. This row will have key columns containing default values (zeros or empty lines), and columns of aggregate functions with the values calculated across all the rows (the "total" values).
......@@ -515,7 +515,7 @@ If `max_rows_to_group_by` and `group_by_overflow_mode = 'any'` are not used, all
You can use WITH TOTALS in subqueries, including subqueries in the JOIN clause (in this case, the respective total values are combined).
#### GROUP BY in external memory
#### GROUP BY in External Memory
You can enable dumping temporary data to the disk to restrict memory usage during GROUP BY.
The `max_bytes_before_external_group_by` setting determines the threshold RAM consumption for dumping GROUP BY temporary data to the file system. If set to 0 (the default), it is disabled.
......@@ -533,7 +533,7 @@ When external aggregation is enabled, if there was less than ` max_bytes_before_
If you have an ORDER BY with a small LIMIT after GROUP BY, then the ORDER BY CLAUSE will not use significant amounts of RAM.
But if the ORDER BY doesn't have LIMIT, don't forget to enable external sorting (`max_bytes_before_external_sort`).
### LIMIT N BY clause
### LIMIT N BY Clause
LIMIT N BY COLUMNS selects the top N rows for each group of COLUMNS. LIMIT N BY is not related to LIMIT; they can both be used in the same query. The key for LIMIT N BY can contain any number of columns or expressions.
......@@ -554,7 +554,7 @@ LIMIT 100
The query will select the top 5 referrers for each `domain, device_type` pair, but not more than 100 rows (`LIMIT n BY + LIMIT`).
### HAVING clause
### HAVING Clause
Allows filtering the result received after GROUP BY, similar to the WHERE clause.
WHERE and HAVING differ in that WHERE is performed before aggregation (GROUP BY), while HAVING is performed after it.
......@@ -562,7 +562,7 @@ If aggregation is not performed, HAVING can't be used.
<a name="query_language-queries-order_by"></a>
### ORDER BY clause
### ORDER BY Clause
The ORDER BY clause contains a list of expressions, which can each be assigned DESC or ASC (the sorting direction). If the direction is not specified, ASC is assumed. ASC is sorted in ascending order, and DESC in descending order. The sorting direction applies to a single expression, not to the entire list. Example: `ORDER BY Visits DESC, SearchPhrase`
......@@ -583,14 +583,14 @@ Running a query may use more memory than 'max_bytes_before_external_sort'. For t
External sorting works much less effectively than sorting in RAM.
### SELECT clause
### SELECT Clause
The expressions specified in the SELECT clause are analyzed after the calculations for all the clauses listed above are completed.
More specifically, expressions are analyzed that are above the aggregate functions, if there are any aggregate functions.
The aggregate functions and everything below them are calculated during aggregation (GROUP BY).
These expressions work as if they are applied to separate rows in the result.
### DISTINCT clause
### DISTINCT Clause
If DISTINCT is specified, only a single row will remain out of all the sets of fully matching rows in the result.
The result will be the same as if GROUP BY were specified across all the fields specified in SELECT without aggregate functions. But there are several differences from GROUP BY:
......@@ -601,7 +601,7 @@ The result will be the same as if GROUP BY were specified across all the fields
DISTINCT is not supported if SELECT has at least one array column.
### LIMIT clause
### LIMIT Clause
LIMIT m allows you to select the first 'm' rows from the result.
LIMIT n, m allows you to select the first 'm' rows from the result after skipping the first 'n' rows.
......@@ -610,7 +610,7 @@ LIMIT n, m allows you to select the first 'm' rows from the result after skippin
If there isn't an ORDER BY clause that explicitly sorts results, the result may be arbitrary and nondeterministic.
### UNION ALL clause
### UNION ALL Clause
You can use UNION ALL to combine any number of queries. Example:
......@@ -635,7 +635,7 @@ The structure of results (the number and type of columns) must match for the que
Queries that are parts of UNION ALL can't be enclosed in brackets. ORDER BY and LIMIT are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with UNION ALL in a subquery in the FROM clause.
### INTO OUTFILE clause
### INTO OUTFILE Clause
Add the `INTO OUTFILE filename` clause (where filename is a string literal) to redirect query output to the specified file.
In contrast to MySQL, the file is created on the client side. The query will fail if a file with the same filename already exists.
......@@ -643,7 +643,7 @@ This functionality is available in the command-line client and clickhouse-local
The default output format is TabSeparated (the same as in the command-line client batch mode).
### FORMAT clause
### FORMAT Clause
Specify 'FORMAT format' to get data in any specified format.
You can use this for convenience, or for creating dumps.
......@@ -652,7 +652,7 @@ If the FORMAT clause is omitted, the default format is used, which depends on bo
When using the command-line client, data is passed to the client in an internal efficient format. The client independently interprets the FORMAT clause of the query and formats the data itself (thus relieving the network and the server from the load).
### IN operators
### IN Operators
The `IN`, `NOT IN`, `GLOBAL IN`, and `GLOBAL NOT IN` operators are covered separately, since their functionality is quite rich.
......@@ -718,7 +718,7 @@ A subquery in the IN clause is always run just one time on a single server. Ther
<a name="queries-distributed-subrequests"></a>
#### Distributed subqueries
#### Distributed Subqueries
There are two options for IN-s with subqueries (similar to JOINs): normal `IN` / ` OIN` and `IN GLOBAL` / `GLOBAL JOIN`. They differ in how they are run for distributed query processing.
......@@ -819,7 +819,7 @@ This is more optimal than using the normal IN. However, keep the following point
It also makes sense to specify a local table in the `GLOBAL IN` clause, in case this local table is only available on the requestor server and you want to use data from it on remote servers.
### Extreme values
### Extreme Values
In addition to results, you can also get minimum and maximum values for the results columns. To do this, set the **extremes** setting to 1. Minimums and maximums are calculated for numeric types, dates, and dates with times. For other columns, the default values are output.
......
......@@ -42,7 +42,7 @@ We recommend using identifiers that do not need to be quoted.
There are numeric literals, string literals, and compound literals.
### Numeric literals
### Numeric Literals
A numeric literal tries to be parsed:
......@@ -56,13 +56,13 @@ For example, 1 is parsed as UInt8, but 256 is parsed as UInt16. For more informa
Examples: `1`, `18446744073709551615`, `0xDEADBEEF`, `01`, `0.1`, `1e100`, `-1e-100`, `inf`, `nan`.
### String literals
### String Literals
Only string literals in single quotes are supported. The enclosed characters can be backslash-escaped. The following escape sequences have a corresponding special value: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\a`, `\v`, `\xHH`. In all other cases, escape sequences in the format `\c`, where "c" is any character, are converted to "c". This means that you can use the sequences `\'`and`\\`. The value will have the String type.
The minimum set of characters that you need to escape in string literals: `'` and `\`.
### Compound literals
### Compound Literals
Constructions are supported for arrays: `[1, 2, 3]` and tuples: `(1, 'Hello, world!', 2)`..
Actually, these are not literals, but expressions with the array creation operator and the tuple creation operator, respectively.
......@@ -81,7 +81,7 @@ Operators are converted to their corresponding functions during query parsing, t
For example, the expression `1 + 2 * 3 + 4` is transformed to `plus(plus(1, multiply(2, 3)), 4)`.
For more information, see the section "Operators" below.
## Data types and database table engines
## Data Types and Database Table Engines
Data types and table engines in the `CREATE` query are written the same way as identifiers or functions. In other words, they may or may not contain an arguments list in brackets. For more information, see the sections "Data types," "Table engines," and "CREATE".
......
## Fixed in ClickHouse release 1.1.54388, 2018-06-28
## Fixed in ClickHouse Release 1.1.54388, 2018-06-28
### CVE-2018-14668
"remote" table function allowed arbitrary symbols in "user", "password" and "default_database" fields which led to Cross Protocol Request Forgery Attacks.
Credits: Andrey Krasichkov of Yandex Information Security Team
## Fixed in ClickHouse release 1.1.54390, 2018-07-06
## Fixed in ClickHouse Release 1.1.54390, 2018-07-06
### CVE-2018-14669
ClickHouse MySQL client had "LOAD DATA LOCAL INFILE" functionality enabled that allowed a malicious MySQL database read arbitrary files from the connected ClickHouse server.
Credits: Andrey Krasichkov and Evgeny Sidorov of Yandex Information Security Team
## Fixed in ClickHouse release 1.1.54131, 2017-01-10
## Fixed in ClickHouse Release 1.1.54131, 2017-01-10
### CVE-2018-14670
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册