Merge https://github.com/ClickHouse/ClickHouse into ncb/remove-unused-headers

3d23c51d · Bharat Nallan · 4f10873c · ce36da83 · 3d23c51d · 3d23c51d
92 changed file
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -289,8 +289,9 @@ set (CMAKE_POSTFIX_VARIABLE "CMAKE_${CMAKE_BUILD_TYPE_UC}_POSTFIX")

 if (MAKE_STATIC_LIBRARIES)
    set (CMAKE_POSITION_INDEPENDENT_CODE OFF)
-    if (OS_LINUX)
+    if (OS_LINUX AND NOT ARCH_ARM)
        # Slightly more efficient code can be generated
+        # It's disabled for ARM because otherwise ClickHouse cannot run on Android.
        set (CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -fno-pie")
        set (CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELWITHDEBINFO} -fno-pie")
        set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,-no-pie")

--- a/docker/images.json
+++ b/docker/images.json
@@ -6,7 +6,6 @@
    "docker/test/compatibility/ubuntu": "yandex/clickhouse-test-old-ubuntu",
    "docker/test/integration/base": "yandex/clickhouse-integration-test",
    "docker/test/performance-comparison": "yandex/clickhouse-performance-comparison",
-    "docker/test/pvs": "yandex/clickhouse-pvs-test",
    "docker/test/stateful": "yandex/clickhouse-stateful-test",
    "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-test-with-coverage",
    "docker/test/stateless": "yandex/clickhouse-stateless-test",

--- a/docker/packager/binary/Dockerfile
+++ b/docker/packager/binary/Dockerfile
-# docker build -t yandex/clickhouse-binary-builder .
+#  docker build -t yandex/clickhouse-binary-builder .
 FROM ubuntu:19.10

 RUN apt-get --allow-unauthenticated update -y && apt-get install --yes wget gnupg

--- a/docker/packager/binary/build.sh
+++ b/docker/packager/binary/build.sh
@@ -18,7 +18,7 @@ ccache --zero-stats ||:
 ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so ||:
 rm -f CMakeCache.txt
 cmake .. -LA -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DSANITIZE=$SANITIZER $CMAKE_FLAGS
-ninja clickhouse-bundle
+ninja -v clickhouse-bundle
 mv ./programs/clickhouse* /output
 mv ./src/unit_tests_dbms /output
 find . -name '*.so' -print -exec mv '{}' /output \;

--- a/docs/en/faq/general/columnar-database.md
+++ b/docs/en/faq/general/columnar-database.md
+---
+title: What is a columnar database?
+toc_hidden: true
+toc_priority: 101
+---
+
+# What Is a Columnar Database? {#what-is-a-columnar-database}
+
+A columnar database stores data of each column independently. This allows to read data from disks only for those columns that are used in any given query. The cost is that operations that affect whole rows become proportionally more expensive. The synonym for a columnar database is a column-oriented database management system. ClickHouse is a typical example of such a system.
+
+Key columnar database advantages are:
+
+-   Queries that use only a few columns out of many.
+-   Aggregating queries against large volumes of data.
+-   Column-wise data compression.
+
+Here is the illustration of the difference between traditional row-oriented systems and columnar databases when building reports:
+
+**Traditional row-oriented**
+![Traditional row-oriented](https://clickhouse.tech/docs/en/images/row-oriented.gif#)
+
+**Columnar**
+![Columnar](https://clickhouse.tech/docs/en/images/column-oriented.gif#)
+
+A columnar database is a preferred choice for analytical applications because it allows to have many columns in a table just in case, but don’t pay the cost for unused columns on read query execution time. Column-oriented databases are designed for big data processing because and data warehousing, they often natively scale using distributed clusters of low-cost hardware to increase throughput. ClickHouse does it with combination of [distributed](../../engines/table-engines/special/distributed.md) and [replicated](../../engines/table-engines/mergetree-family/replication.md) tables.
--- a/docs/en/faq/general/dbms-naming.md
+++ b/docs/en/faq/general/dbms-naming.md
 ---
+title: "What does \u201CClickHouse\u201D mean?"
 toc_hidden: true
 toc_priority: 10
 ---

 # What Does “ClickHouse” Mean? {#what-does-clickhouse-mean}

-It’s a combination of “**Click**stream” and “Data ware**house**”. It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet and it still does the job. You can read more about this use case on [ClickHouse history](../../introduction/history.md) page.
+It’s a combination of “**Click**stream” and “Data ware**House**”. It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet and it still does the job. You can read more about this use case on [ClickHouse history](../../introduction/history.md) page.
+
+This two-part meaning has two consequences:
+
+-   The only correct way to write Click**H**ouse is with capital H.
+-   If you need to abbreviate it, use **CH**. For some historical reasons, abbreviating as CK is also popular in China, mostly because one of the first talks about ClickHouse in Chinese used this form.

 !!! info "Fun fact"
    Many years after ClickHouse got its name, this approach of combining two words that are meaningful on their own has been highlighted as the best way to name a database in a [research by Andy Pavlo](https://www.cs.cmu.edu/~pavlo/blog/2020/03/on-naming-a-database-management-system.html), an Associate Professor of Databases at Carnegie Mellon University. ClickHouse shared his “best database name of all time” award with Postgres.
--- a/docs/en/faq/general/index.md
+++ b/docs/en/faq/general/index.md
 ---
+title: General questions about ClickHouse
 toc_hidden_folder: true
 toc_priority: 1
 toc_title: General
@@ -8,8 +9,13 @@ toc_title: General

 Questions:

+-   [What is ClickHouse?](../../index.md#what-is-clickhouse)
+-   [Why ClickHouse is so fast?](../../faq/general/why-clickhouse-is-so-fast.md)
+-   [Who is using ClickHouse?](../../faq/general/who-is-using-clickhouse.md)
 -   [What does “ClickHouse” mean?](../../faq/general/dbms-naming.md)
 -   [What does “Не тормозит” mean?](../../faq/general/ne-tormozit.md)
+-   [What is OLAP?](../../faq/general/olap.md)
+-   [What is a columnar database?](../../faq/general/columnar-database.md)
 -   [Why not use something like MapReduce?](../../faq/general/mapreduce.md)

 !!! info "Don’t see what you were looking for?"

--- a/docs/en/faq/general/mapreduce.md
+++ b/docs/en/faq/general/mapreduce.md
 ---
+title: Why not use something like MapReduce?
 toc_hidden: true
-toc_priority: 20
+toc_priority: 110
 ---

 # Why Not Use Something Like MapReduce? {#why-not-use-something-like-mapreduce}

--- a/docs/en/faq/general/ne-tormozit.md
+++ b/docs/en/faq/general/ne-tormozit.md
 ---
+title: "What does \u201C\u043D\u0435 \u0442\u043E\u0440\u043C\u043E\u0437\u0438\u0442\
+  \u201D mean?"
 toc_hidden: true
 toc_priority: 11
 ---

-# What Does “Не тормозит” mean? {#what-does-ne-tormozit-mean}
+# What Does “Не тормозит” Mean? {#what-does-ne-tormozit-mean}

 This question usually arises when people see official ClickHouse t-shirts. They have large words **“ClickHouse не тормозит”** on the front.

-Before ClickHouse became open-source, it has been developed as an in-house storage system by the largest Russian IT company, [Yandex](https://yandex.com/company/). That’s why it initially got its slogan in Russian, which is “не тормозит”. After the open-source release we first produced some of those t-shirts for events in Russia and it was a no-brainer to use the slogan as-is.
+Before ClickHouse became open-source, it has been developed as an in-house storage system by the largest Russian IT company, [Yandex](https://yandex.com/company/). That’s why it initially got its slogan in Russian, which is “не тормозит” (pronounced as “ne tormozit”). After the open-source release we first produced some of those t-shirts for events in Russia and it was a no-brainer to use the slogan as-is.

 One of the following batches of those t-shirts was supposed to be given away on events outside of Russia and we tried to make the English version of the slogan. Unfortunately, the Russian language is kind of elegant in terms of expressing stuff and there was a restriction of limited space on a t-shirt, so we failed to come up with good enough translation (most options appeared to be either long or inaccurate) and decided to keep the slogan in Russian even on t-shirts produced for international events. It appeared to be a great decision because people all over the world get positively surprised and curious when they see it.


--- a/docs/en/faq/general/olap.md
+++ b/docs/en/faq/general/olap.md
+---
+title: What is OLAP?
+toc_hidden: true
+toc_priority: 100
+---
+
+# What Is OLAP? {#what-is-olap}
+
+[OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business. But at the very high level, you can just read these words backward:
+
+Processing
+:   Some source data is processed…
+
+Analytical
+:   …to produce some analytical reports and insights…
+
+Online
+:   …in real-time.
+
+## OLAP from the Business Perspective {#olap-from-the-business-perspective}
+
+In recent years, business people started to realize the value of data. Companies who make their decisions blindly, more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be remotely useful for making business decisions and need mechanisms to timely analyze them. Here’s where OLAP database management systems (DBMS) come in.
+
+In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI applications (Business Intelligence).
+
+ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and an in-house data warehouse scenario is also viable.
+
+## OLAP from the Technical Perspective {#olap-from-the-technical-perspective}
+
+All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). Former focuses on building reports, each based on large volumes of historical data, but doing it not so frequently. While the latter usually handle a continuous stream of transactions, constantly modifying the current state of data.
+
+In practice OLAP and OLTP are not categories, it’s more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system.
+
+Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../faq/general/why-clickhouse-is-so-fast.md) and it still doesn’t have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
+
+The fundamental trade-off between OLAP and OLTP systems remains:
+
+-   To build analytical reports efficiently it’s crucial to be able to read columns separately, thus most OLAP databases are [columnar](../../faq/general/columnar-database.md),
+-   While storing columns separately increases costs of operations on rows, like append or in-place modification, proportionally to the number of columns (which can be huge if the systems try to collect all details of an event just in case). Thus, most OLTP systems store data arranged by rows.
--- a/docs/en/faq/general/who-is-using-clickhouse.md
+++ b/docs/en/faq/general/who-is-using-clickhouse.md
+---
+title: Who is using ClickHouse?
+toc_hidden: true
+toc_priority: 9
+---
+
+# Who Is Using ClickHouse? {#who-is-using-clickhouse}
+
+Being an open-source product makes this question not so straightforward to answer. You don’t have to tell anyone if you want to start using ClickHouse, you just go grab source code or pre-compiled packages. There’s no contract to sign and the [Apache 2.0 license](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) allows for unconstrained software distribution.
+
+Also, the technology stack is often in a grey zone of what’s covered by an NDA. Some companies consider technologies they use as a competitive advantage even if they are open-source and don’t allow employees to share any details publicly. Some see some PR risks and allow employees to share implementation details only with their PR department approval.
+
+So how to tell who is using ClickHouse?
+
+One way is to **ask around**. If it’s not in writing, people are much more willing to share what technologies are used in their companies, what the use cases are, what kind of hardware is used, data volumes, etc. We’re talking with users regularly on [ClickHouse Meetups](https://www.youtube.com/channel/UChtmrD-dsdpspr42P_PyRAw/playlists) all over the world and have heard stories about 1000+ companies that use ClickHouse. Unfortunately, that’s not reproducible and we try to treat such stories as if they were told under NDA to avoid any potential troubles. But you can come to any of our future meetups and talk with other users on your own. There are multiple ways how meetups are announced, for example, you can subscribe to [our Twitter](http://twitter.com/ClickHouseDB/).
+
+The second way is to look for companies **publicly saying** that they use ClickHouse. It’s more substantial because there’s usually some hard evidence like a blog post, talk video recording, slide deck, etc. We collect the collection of links to such evidence on our **[Adopters](../../introduction/adopters.md)** page. Feel free to contribute the story of your employer or just some links you’ve stumbled upon (but try not to violate your NDA in the process).
+
+You can find names of very large companies in the adopters list, like Bloomberg, Cisco, China Telecom, Tencent, or Uber, but with the first approach, we found that there are many more. For example, if you take [the list of largest IT companies by Forbes (2020)](https://www.forbes.com/sites/hanktucker/2020/05/13/worlds-largest-technology-companies-2020-apple-stays-on-top-zoom-and-uber-debut/) over half of them are using ClickHouse in some way. Also, it would be unfair not to mention [Yandex](../../introduction/history.md), the company which initially open-sourced ClickHouse in 2016 and happens to be one of the largest IT companies in Europe.
--- a/docs/en/faq/general/why-clickhouse-is-so-fast.md
+++ b/docs/en/faq/general/why-clickhouse-is-so-fast.md
+---
+title: Why ClickHouse is so fast?
+toc_hidden: true
+toc_priority: 8
+---
+
+# Why ClickHouse Is So Fast? {#why-clickhouse-is-so-fast}
+
+It was designed to be fast. Query execution performance has always been a top priority during the development process, but other important characteristics like user-friendliness, scalability, and security were also considered so ClickHouse could become a real production system.
+
+ClickHouse was initially built as a prototype to do just a single task well: to filter and aggregate data as fast as possible. That’s what needs to be done to build a typical analytical report and that’s what a typical [GROUP BY](../../sql-reference/statements/select/group-by.md) query does. ClickHouse team has made several high-level decisions that combined made achieving this task possible:
+
+Column-oriented storage
+:   Source data often contain hundreds or even thousands of columns, while a report can use just a few of them. The system needs to avoid reading unnecessary columns, or most expensive disk read operations would be wasted.
+
+Indexes
+:   ClickHouse keeps data structures in memory that allows reading not only used columns but only necessary row ranges of those columns.
+
+Data compression
+:   Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data column often has the same or not so many different values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create.md#create-query-specialized-codecs) that can make data even more compact.
+
+Vectorized query execution
+:   ClickHouse not only stores data in columns but also processes data in columns. It leads to better CPU cache utilization and allows for [SIMD](https://en.wikipedia.org/wiki/SIMD) CPU instructions usage.
+
+Scalability
+:   ClickHouse can leverage all available CPU cores and disks to execute even a single query. Not only on a single server but all CPU cores and disks of a cluster as well.
+
+But many other database management systems use similar techniques. What really makes ClickHouse stand out is **attention to low-level details**. Most programming languages provide implementations for most common algorithms and data structures, but they tend to be too generic to be effective. Every task can be considered as a landscape with various characteristics, instead of just throwing in random implementation. For example, if you need a hash table, here are some key questions to consider:
+
+-   Which hash function to choose?
+-   Collision resolution algorithm: [open addressing](https://en.wikipedia.org/wiki/Open_addressing) vs [chaining](https://en.wikipedia.org/wiki/Hash_table#Separate_chaining)?
+-   Memory layout: one array for keys and values or separate arrays? Will it store small or large values?
+-   Fill factor: when and how to resize? How to move values around on resize?
+-   Will values be removed and which algorithm will work better if they will?
+-   Will we need fast probing with bitmaps, inline placement of string keys, support for non-movable values, prefetch, and batching?
+
+Hash table is a key data structure for `GROUP BY` implementation and ClickHouse automatically chooses one of [30+ variations](https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/Aggregator.h) for each specific query.
+
+The same goes for algorithms, for example, in sorting you might consider:
+
+-   What will be sorted: an array of numbers, tuples, strings, or structures?
+-   Is all data available completely in RAM?
+-   Do we need a stable sort?
+-   Do we need a full sort? Maybe partial sort or n-th element will suffice?
+-   How to implement comparisons?
+-   Are we sorting data that has already been partially sorted?
+
+Algorithms that they rely on characteristics of data they are working with can often do better than their generic counterparts. If it is not really known in advance, the system can try various implementations and choose the one that works best in runtime. For example, see an [article on how LZ4 decompression is implemented in ClickHouse](https://habr.com/en/company/yandex/blog/457612/).
+
+Last but not least, the ClickHouse team always monitors the Internet on people claiming that they came up with the best implementation, algorithm, or data structure to do something and tries it out. Those claims mostly appear to be false, but from time to time you’ll indeed find a gem.
+
+!!! info "Tips for building your own high-performance software"
+
+
+    -   Keep in mind low-level details when designing your system.
+    -   Design based on hardware capabilities.
+    -   Choose data structures and abstractions based on the needs of the task.
+    -   Provide specializations for special cases.
+    -   Try new, “best” algorithms, that you read about yesterday.
+    -   Choose an algorithm in runtime based on statistics.
+    -   Benchmark on real datasets.
+    -   Test for performance regressions in CI.
+    -   Measure and observe everything.
--- a/docs/en/faq/index.md
+++ b/docs/en/faq/index.md
@@ -10,8 +10,37 @@ This section of the documentation is a place to collect answers to ClickHouse-re

 Categories:

-   [General](../faq/general/index.md)
-   [Operations](../faq/operations/index.md)
-   [Integration](../faq/integration/index.md)
+-   **[General](../faq/general/index.md)**
+    -   [What is ClickHouse?](../index.md#what-is-clickhouse)
+    -   [Why ClickHouse is so fast?](../faq/general/why-clickhouse-is-so-fast.md)
+    -   [Who is using ClickHouse?](../faq/general/who-is-using-clickhouse.md)
+    -   [What does “ClickHouse” mean?](../faq/general/dbms-naming.md)
+    -   [What does “Не тормозит” mean?](../faq/general/ne-tormozit.md)
+    -   [What is OLAP?](../faq/general/olap.md)
+    -   [What is a columnar database?](../faq/general/columnar-database.md)
+    -   [Why not use something like MapReduce?](../faq/general/mapreduce.md)
+-   **[Use Cases](../faq/use-cases/index.md)**
+    -   [Can I use ClickHouse as a time-series database?](../faq/use-cases/time-series.md)
+    -   [Can I use ClickHouse as a key-value storage?](../faq/use-cases/key-value.md)
+-   **[Operations](../faq/operations/index.md)**
+    -   [Which ClickHouse version to use in production?](../faq/operations/production.md)
+    -   [Is it possible to delete old records from a ClickHouse table?](../faq/operations/delete-old-data.md)
+-   **[Integration](../faq/integration/index.md)**
+    -   [How do I export data from ClickHouse to a file?](../faq/integration/file-export.md)
+    -   [What if I have a problem with encodings when connecting to Oracle via ODBC?](../faq/integration/oracle-odbc.md)
+
+{## TODO
+Question candidates:
+- How to choose a primary key?
+- How to add a column in ClickHouse?
+- Too many parts
+- How to filter ClickHouse table by an array column contents?
+- How to insert all rows from one table to another of identical structure?
+- How to kill a process (query) in ClickHouse?
+- How to implement pivot (like in pandas)?
+- How to remove the default ClickHouse user through users.d?
+- Importing MySQL dump to Clickhouse
+- Window function workarounds (row\_number, lag/lead, running diff/sum/average)
+##}

 {## [Original article](https://clickhouse.tech/docs/en/faq) ##}
--- a/docs/en/faq/integration/file-export.md
+++ b/docs/en/faq/integration/file-export.md
 ---
+title: How do I export data from ClickHouse to a file?
 toc_hidden: true
 toc_priority: 10
 ---

--- a/docs/en/faq/integration/index.md
+++ b/docs/en/faq/integration/index.md
 ---
+title: Questions about integrating ClickHouse and other systems
 toc_hidden_folder: true
-toc_priority: 3
+toc_priority: 4
 toc_title: Integration
 ---

-# Question About Integrating ClickHouse and Other Systems {#question-about-integrating-clickhouse-and-other-systems}
+# Questions About Integrating ClickHouse and Other Systems {#question-about-integrating-clickhouse-and-other-systems}

 Questions:

 -   [How do I export data from ClickHouse to a file?](../../faq/integration/file-export.md)
-   [What if I Have a problem with encodings when connecting to Oracle via ODBC?](../../faq/integration/oracle-odbc.md)
+-   [How to import JSON into ClickHouse?](../../faq/integration/json-import.md)
+-   [What if I have a problem with encodings when connecting to Oracle via ODBC?](../../faq/integration/oracle-odbc.md)

 !!! info "Don’t see what you were looking for?"
    Check out [other F.A.Q. categories](../../faq/index.md) or browse around main documentation articles found in the left sidebar.

--- a/docs/en/faq/integration/json-import.md
+++ b/docs/en/faq/integration/json-import.md
+---
+title: How to import JSON into ClickHouse?
+toc_hidden: true
+toc_priority: 11
+---
+
+# How to Import JSON Into ClickHouse? {#how-to-import-json-into-clickhouse}
+
+ClickHouse supports a wide range of [data formats for input and output](../../interfaces/formats.md). There are multiple JSON variations among them, but the most commonly used for data ingestion is [JSONEachRow](../../interfaces/formats.md#jsoneachrow). It expects one JSON object per row, each object separated by a newline.
+
+## Examples {#examples}
+
+Using [HTTP interface](../../interfaces/http.md):
+
+``` bash
+$ echo '{"foo":"bar"}' | curl 'http://localhost:8123/?query=INSERT%20INTO%20test%20FORMAT%20JSONEachRow' --data-binary @-
+```
+
+Using [CLI interface](../../interfaces/cli.md):
+
+``` bash
+$ echo '{"foo":"bar"}'  | clickhouse-client ---query="INSERT INTO test FORMAT 20JSONEachRow"
+```
+
+Instead of inserting data manually, you might consider to use one of [client libraries](../../interfaces/index.md) instead.
+
+## Useful Settings {#useful-settings}
+
+-   `input_format_skip_unknown_fields` allows to insert JSON even if there were additional fields not present in table schema (by discarding them).
+-   `input_format_import_nested_json` allows to insert nested JSON objects into columns of [Nested](../../sql-reference/data-types/nested-data-structures/nested.md) type.
+
+!!! note "Note"
+    Settings are specified as `GET` parameters for the HTTP interface or as additional command-line arguments prefixed with `--` for the CLI interface.
--- a/docs/en/faq/integration/oracle-odbc.md
+++ b/docs/en/faq/integration/oracle-odbc.md
 ---
+title: What if I have a problem with encodings when using Oracle via ODBC?
 toc_hidden: true
 toc_priority: 20
 ---

--- a/docs/en/faq/operations/delete-old-data.md
+++ b/docs/en/faq/operations/delete-old-data.md
+---
+title: Is it possible to delete old records from a ClickHouse table?
+toc_hidden: true
+toc_priority: 20
+---
+
+# Is It Possible to Delete Old Records from a ClickHouse Table? {#is-it-possible-to-delete-old-records-from-a-clickhouse-table}
+
+The short answer is “yes”. ClickHouse has multiple mechanisms that allow freeing up disk space by removing old data. Each mechanism is aimed for different scenarios.
+
+## TTL {#ttl}
+
+ClickHouse allows to automatically drop values when some condition happens. This condition is configured as an expression based on any columns, usually just static offset for any timestamp column.
+
+The key advantage of this approach is that it doesn’t need any external system to trigger, once TTL is configured, data removal happens automatically in background.
+
+!!! note "Note"
+    TTL can also be used to move data not only to [/dev/null](https://en.wikipedia.org/wiki/Null_device), but also between different storage systems, like from SSD to HDD.
+
+More details on [configuring TTL](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl).
+
+## ALTER DELETE {#alter-delete}
+
+ClickHouse doesn’t have real-time point deletes like in [OLTP](https://en.wikipedia.org/wiki/Online_transaction_processing) databases. The closest thing to them are mutations. They are issued as `ALTER ... DELETE` or `ALTER ... UPDATE` queries to distinguish from normal `DELETE` or `UPDATE` as they are asynchronous batch operations, not immediate modifications. The rest of syntax after `ALTER TABLE` prefix is similar.
+
+`ALTER DELETE` can be issued to flexibly remove old data. If you need to do it regularly, the main downside will be the need to have an external system to submit the query. There are also some performance considerations since mutation rewrite complete parts even there’s only a single row to be deleted.
+
+This is the most common approach to make your system based on ClickHouse [GDPR](https://gdpr-info.eu)-compliant.
+
+More details on [mutations](../../sql-reference/statements/alter.md#alter-mutations).
+
+## DROP PARTITION {#drop-partition}
+
+`ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition. It’s not that flexible and needs proper partitioning scheme configured on table creation, but still covers most common cases. Like mutations need to be executed from an external system for regular use.
+
+More details on [manipulating partitions](../../sql-reference/statements/alter.md#alter_drop-partition).
+
+## TRUNCATE {#truncate}
+
+It’s rather radical to drop all data from a table, but in some cases it might be exactly what you need.
+
+More details on [table truncation](../../sql-reference/statements/alter.md#alter_drop-partition).
--- a/docs/en/faq/operations/index.md
+++ b/docs/en/faq/operations/index.md
 ---
+title: Question about operating ClickHouse servers and clusters
 toc_hidden_folder: true
-toc_priority: 2
+toc_priority: 3
 toc_title: Operations
 ---

@@ -9,6 +10,7 @@ toc_title: Operations
 Questions:

 -   [Which ClickHouse version to use in production?](../../faq/operations/production.md)
+-   [Is it possible to delete old records from a ClickHouse table?](../../faq/operations/delete-old-data.md)

 !!! info "Don’t see what you were looking for?"
    Check out [other F.A.Q. categories](../../faq/index.md) or browse around main documentation articles found in the left sidebar.

--- a/docs/en/faq/operations/production.md
+++ b/docs/en/faq/operations/production.md
 ---
+title: Which ClickHouse version to use in production?
 toc_hidden: true
 toc_priority: 10
 ---

--- a/docs/en/faq/use-cases/index.md
+++ b/docs/en/faq/use-cases/index.md
+---
+title: Questions about ClickHouse use cases
+toc_hidden_folder: true
+toc_priority: 2
+toc_title: Use Cases
+---
+
+# Questions About ClickHouse Use Cases {#questions-about-clickhouse-use-cases}
+
+Questions:
+
+-   [Can I use ClickHouse as a time-series database?](../../faq/use-cases/time-series.md)
+-   [Can I use ClickHouse as a key-value storage?](../../faq/use-cases/key-value.md)
+
+!!! info "Don’t see what you were looking for?"
+    Check out [other F.A.Q. categories](../../faq/index.md) or browse around main documentation articles found in the left sidebar.
+
+{## [Original article](https://clickhouse.tech/docs/en/faq/use-cases/) ##}
--- a/docs/en/faq/use-cases/key-value.md
+++ b/docs/en/faq/use-cases/key-value.md
+---
+title: Can I use ClickHouse as a key-value storage?
+toc_hidden: true
+toc_priority: 101
+---
+
+# Can I Use ClickHouse As a Key-Value Storage? {#can-i-use-clickhouse-as-a-key-value-storage}
+
+The short answer is **“no”**. The key-value workload is among top positions in the list of cases when NOT{.text-danger} to use ClickHouse. It’s an [OLAP](../../faq/general/olap.md) system after all, while there are many excellent key-value storage systems out there.
+
+However, there might be situations where it still makes sense to use ClickHouse for key-value-like queries. Usually, it’s some low-budget products where the main workload is analytical in nature and fits ClickHouse well, but there’s also some secondary process that needs a key-value pattern with not so high request throughput and without strict latency requirements. If you had an unlimited budget, you would have installed a secondary key-value database for thus secondary workload, but in reality, there’s an additional cost of maintaining one more storage system (monitoring, backups, etc.) which might be desirable to avoid.
+
+If you decide to go against recommendations and run some key-value-like queries against ClickHouse, here’re some tips:
+
+-   The key reason why point queries are expensive in ClickHouse is its sparse primary index of main [MergeTree table engine family](../../engines/table-engines/mergetree-family/mergetree.md). This index can’t point to each specific row of data, instead, it points to each N-th and the system has to scan from the neighboring N-th row to the desired one, reading excessive data along the way. In a key-value scenario, it might be useful to reduce the value of N with the `index_granularity` setting.
+-   ClickHouse keeps each column in a separate set of files, so to assemble one complete row it needs to go through each of those files. Their count increases linearly with the number of columns, so in the key-value scenario, it might be worth to avoid using many columns and put all your payload in a single `String` column encoded in some serialization format like JSON, Protobuf or whatever makes sense.
+-   There’s an alternative approach that uses [Join](../../engines/table-engines/special/join.md) table engine instead of normal `MergeTree` tables and [joinGet](../../sql-reference/functions/other-functions.md#joinget) function to retrieve the data. It can provide better query performance but might have some usability and reliability issues. Here’s an [usage example](https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/00800_versatile_storage_join.sql#L49-L51).
--- a/docs/en/faq/use-cases/time-series.md
+++ b/docs/en/faq/use-cases/time-series.md
+---
+title: Can I use ClickHouse as a time-series database?
+toc_hidden: true
+toc_priority: 101
+---
+
+# Can I Use ClickHouse As a Time-Series Database? {#can-i-use-clickhouse-as-a-time-series-database}
+
+ClickHouse is a generic data storage solution for [OLAP](../../faq/general/olap.md) workloads, while there are many specialized time-series database management systems. Nevertheless, ClickHouse’s [focus on query execution speed](../../faq/general/why-clickhouse-is-so-fast.md) allows it to outperform specialized systems in many cases. There are many independent benchmarks on this topic out there ([example](https://medium.com/@AltinityDB/clickhouse-for-time-series-scalability-benchmarks-e181132a895b)), so we’re not going to conduct one here. Instead, let’s focus on ClickHouse features that are important to use if that’s your use case.
+
+First of all, there are **[specialized codecs](../../sql-reference/statements/create.md#create-query-specialized-codecs)** which make typical time-series. Either common algorithms like `DoubleDelta` and `Gorilla` or specific to ClickHouse like `T64`.
+
+Second, time-series queries often hit only recent data, like one day or one week old. It makes sense to use servers that have both fast nVME/SSD drives and high-capacity HDD drives. ClickHouse [TTL](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) feature allows to configure keeping fresh hot data on fast drives and gradually move it to slower drives as it ages. Rollup or removal of even older data is also possible if your requirements demand it.
+
+Even though it’s against ClickHouse philosophy of storing and processing raw data, you can use [materialized views](../../sql-reference/statements/create.md#create-view) to fit into even tighter latency or costs requirements.
--- a/docs/en/operations/server-configuration-parameters/settings.md
+++ b/docs/en/operations/server-configuration-parameters/settings.md
@@ -426,6 +426,18 @@ The value 0 means that you can delete all tables without any restrictions.
 <max_table_size_to_drop>0</max_table_size_to_drop>
 ```

+## max\_thread\_pool\_size {#max-thread-pool-size}
+
+The maximum number of threads in the Global Thread pool.
+
+Default value: 10000.
+
+**Example**
+
+``` xml
+<max_thread_pool_size>12000</max_thread_pool_size>
+```
+
 ## merge\_tree {#server_configuration_parameters-merge_tree}

 Fine tuning for tables in the [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md).

--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@@ -1129,6 +1129,18 @@ Possible values:

 Default value: 0

+## optimize\_skip\_unused\_shards\_nesting {#optimize-skip-unused-shards-nesting}
+
+Controls [`optimize_skip_unused_shards`](#optimize-skip-unused-shards) (hence still requires [`optimize_skip_unused_shards`](#optimize-skip-unused-shards)) depends on the nesting level of the distributed query (case when you have `Distributed` table that look into another `Distributed` table).
+
+Possible values:
+
+-   0 — Disabled, `optimize_skip_unused_shards` works always.
+-   1 — Enables `optimize_skip_unused_shards` only for the first level.
+-   2 — Enables `optimize_skip_unused_shards` up to the second level.
+
+Default value: 0
+
 ## force\_optimize\_skip\_unused\_shards {#force-optimize-skip-unused-shards}

 Enables or disables query execution if [optimize\_skip\_unused\_shards](#optimize-skip-unused-shards) is enabled and skipping of unused shards is not possible. If the skipping is not possible and the setting is enabled, an exception will be thrown.
@@ -1141,16 +1153,17 @@ Possible values:

 Default value: 0

-## force\_optimize\_skip\_unused\_shards\_no\_nested {#settings-force_optimize_skip_unused_shards_no_nested}
+## force\_optimize\_skip\_unused\_shards\_nesting {#settings-force_optimize_skip_unused_shards_nesting}

-Reset [`optimize_skip_unused_shards`](#optimize-skip-unused-shards) for nested `Distributed` table
+Controls [`force_optimize_skip_unused_shards`](#force-optimize-skip-unused-shards) (hence still requires [`force_optimize_skip_unused_shards`](#force-optimize-skip-unused-shards)) depends on the nesting level of the distributed query (case when you have `Distributed` table that look into another `Distributed` table).

 Possible values:

-   1 — Enabled.
-   0 — Disabled.
+-   0 - Disabled, `force_optimize_skip_unused_shards` works always.
+-   1 — Enables `force_optimize_skip_unused_shards` only for the first level.
+-   2 — Enables `force_optimize_skip_unused_shards` up to the second level.

-Default value: 0.
+Default value: 0

 ## optimize\_throw\_if\_noop {#setting-optimize_throw_if_noop}


--- a/docs/en/sql-reference/statements/select/index.md
+++ b/docs/en/sql-reference/statements/select/index.md
@@ -17,7 +17,7 @@ SELECT [DISTINCT] expr_list
 [FROM [db.]table | (subquery) | table_function] [FINAL]
 [SAMPLE sample_coeff]
 [ARRAY JOIN ...]
-[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
+[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)
 [PREWHERE expr]
 [WHERE expr]
 [GROUP BY expr_list] [WITH TOTALS]

--- a/docs/en/sql-reference/statements/system.md
+++ b/docs/en/sql-reference/statements/system.md
@@ -12,6 +12,7 @@ toc_title: SYSTEM
 -   [DROP MARK CACHE](#query_language-system-drop-mark-cache)
 -   [DROP UNCOMPRESSED CACHE](#query_language-system-drop-uncompressed-cache)
 -   [DROP COMPILED EXPRESSION CACHE](#query_language-system-drop-compiled-expression-cache)
+-   [DROP REPLICA](#query_language-system-drop-replica)
 -   [FLUSH LOGS](#query_language-system-flush_logs)
 -   [RELOAD CONFIG](#query_language-system-reload-config)
 -   [SHUTDOWN](#query_language-system-shutdown)
@@ -67,6 +68,24 @@ For more convenient (automatic) cache management, see disable\_internal\_dns\_ca

 Resets the mark cache. Used in development of ClickHouse and performance tests.

+## DROP REPLICA {#query_language-system-drop-replica}
+
+Dead replicas can be dropped using following syntax:
+
+```sql
+SYSTEM DROP REPLICA 'replica_name' FROM TABLE database.table;
+SYSTEM DROP REPLICA 'replica_name' FROM DATABASE database;
+SYSTEM DROP REPLICA 'replica_name';
+SYSTEM DROP REPLICA 'replica_name' FROM ZKPATH '/path/to/table/in/zk';
+```
+
+Queries will remove the replica path in ZooKeeper. It's useful when replica is dead and its metadata cannot be removed from ZooKeeper by `DROP TABLE` because there is no such table anymore. It will only drop the inactive/stale replica, and it can't drop local replica, please use `DROP TABLE` for that. `DROP REPLICA` does not drop any tables and does not remove any data or metadata from disk.
+
+The first one removes metadata of `'replica_name'` replica of  `database.table` table.
+The second one does the same for all replicated tables in the database.
+The third one does the same for all replicated tables on local server.
+The forth one is useful to remove metadata of dead replica when all other replicas of a table were dropped. It requires the table path to be specified explicitly. It must be the same path as was passed to the first argument of `ReplicatedMergeTree` engine on table creation. 
+
 ## DROP UNCOMPRESSED CACHE {#query_language-system-drop-uncompressed-cache}

 Reset the uncompressed data cache. Used in development of ClickHouse and performance tests.

--- a/docs/en/whats-new/index.md
+++ b/docs/en/whats-new/index.md
 ---
 toc_folder_title: What's New
-toc_priority: 72
+toc_priority: 82
 ---

+# What's New In ClickHouse?
+
+There's a short high-level [roadmap](roadmap.md) and a detailed [changelog](changelog/index.md) for releases that have already been published.
+

--- a/docs/en/whats-new/roadmap.md
+++ b/docs/en/whats-new/roadmap.md
@@ -5,12 +5,14 @@ toc_title: Roadmap

 # Roadmap {#roadmap}

-## Q2 2020 {#q2-2020}
+## Q3 2020 {#q3-2020}

-   Integration with external authentication services
+-   High durability mode (`fsync` and WAL)
+-   Support spilling data to disk in `GLOBAL JOIN`

-## Q3 2020 {#q3-2020}
+## Q4 2020 {#q4-2020}

+-   Improved efficiency of distributed queries
 -   Resource pools for more precise distribution of cluster capacity between users

 {## [Original article](https://clickhouse.tech/docs/en/roadmap/) ##}
--- a/docs/es/operations/settings/settings.md
+++ b/docs/es/operations/settings/settings.md
@@ -1048,17 +1048,6 @@ Valores posibles:

 Valor predeterminado: 0

-## force\_optimize\_skip\_unused\_shards\_no\_nested {#settings-force_optimize_skip_unused_shards_no_nested}
-
-Restablecer [`optimize_skip_unused_shards`](#settings-force_optimize_skip_unused_shards) para anidados `Distributed` tabla
-
-Valores posibles:
-
-   1 — Enabled.
-   0 — Disabled.
-
-Valor predeterminado: 0.
-
 ## Optize\_throw\_if\_noop {#setting-optimize_throw_if_noop}

 Habilita o deshabilita el lanzamiento de una excepción [OPTIMIZE](../../sql-reference/statements/misc.md#misc_operations-optimize) la consulta no realizó una fusión.

--- a/docs/es/sql-reference/statements/select/index.md
+++ b/docs/es/sql-reference/statements/select/index.md
@@ -15,7 +15,7 @@ SELECT [DISTINCT] expr_list
 [FROM [db.]table | (subquery) | table_function] [FINAL]
 [SAMPLE sample_coeff]
 [ARRAY JOIN ...]
-[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
+[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)
 [PREWHERE expr]
 [WHERE expr]
 [GROUP BY expr_list] [WITH TOTALS]

--- a/docs/fa/operations/settings/settings.md
+++ b/docs/fa/operations/settings/settings.md
@@ -1048,17 +1048,6 @@ The results of the compilation are saved in the build directory in the form of .

 مقدار پیشفرض: 0

-## به زور \_بهتیتیتیتی\_سکیپ\_اس\_ش\_شارد\_مایش داده میشود {#settings-force_optimize_skip_unused_shards_no_nested}
-
-بازنشانی [`optimize_skip_unused_shards`](#settings-force_optimize_skip_unused_shards) برای تو در تو `Distributed` جدول
-
-مقادیر ممکن:
-
-   1 — Enabled.
-   0 — Disabled.
-
-مقدار پیش فرض: 0.
-
 ## ا\_فزون\_ف\_کوپ {#setting-optimize_throw_if_noop}

 را قادر می سازد و یا غیر فعال پرتاب یک استثنا اگر یک [OPTIMIZE](../../sql-reference/statements/misc.md#misc_operations-optimize) پرس و جو یک ادغام انجام نمی.

--- a/docs/fa/sql-reference/statements/select/index.md
+++ b/docs/fa/sql-reference/statements/select/index.md
@@ -15,7 +15,7 @@ SELECT [DISTINCT] expr_list
 [FROM [db.]table | (subquery) | table_function] [FINAL]
 [SAMPLE sample_coeff]
 [ARRAY JOIN ...]
-[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
+[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)
 [PREWHERE expr]
 [WHERE expr]
 [GROUP BY expr_list] [WITH TOTALS]

--- a/docs/fr/operations/settings/settings.md
+++ b/docs/fr/operations/settings/settings.md
@@ -1048,17 +1048,6 @@ Valeurs possibles:

 Valeur par défaut: 0

-## force\_optimize\_skip\_unused\_shards\_no\_nested {#settings-force_optimize_skip_unused_shards_no_nested}
-
-Réinitialiser [`optimize_skip_unused_shards`](#settings-force_optimize_skip_unused_shards) pour imbriquée `Distributed` table
-
-Valeurs possibles:
-
-   1 — Enabled.
-   0 — Disabled.
-
-Valeur par défaut: 0.
-
 ## optimize\_throw\_if\_noop {#setting-optimize_throw_if_noop}

 Active ou désactive le lancement d'une exception si [OPTIMIZE](../../sql-reference/statements/misc.md#misc_operations-optimize) la requête n'a pas effectué de fusion.

--- a/docs/fr/sql-reference/statements/select/index.md
+++ b/docs/fr/sql-reference/statements/select/index.md
@@ -15,7 +15,7 @@ SELECT [DISTINCT] expr_list
 [FROM [db.]table | (subquery) | table_function] [FINAL]
 [SAMPLE sample_coeff]
 [ARRAY JOIN ...]
-[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
+[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)
 [PREWHERE expr]
 [WHERE expr]
 [GROUP BY expr_list] [WITH TOTALS]

--- a/docs/ja/operations/settings/settings.md
+++ b/docs/ja/operations/settings/settings.md
@@ -1048,17 +1048,6 @@ PREWHERE/WHEREにシャーディングキー条件があるSELECTクエリの未

 デフォルト値:0

-## force\_optimize\_skip\_unused\_shards\_no\_nested {#settings-force_optimize_skip_unused_shards_no_nested}
-
-リセット [`optimize_skip_unused_shards`](#settings-force_optimize_skip_unused_shards) 入れ子の場合 `Distributed` テーブル
-
-可能な値:
-
-   1 — Enabled.
-   0 — Disabled.
-
-デフォルト値は0です。
-
 ## optimize\_throw\_if\_noop {#setting-optimize_throw_if_noop}

 例外のスローを有効または無効にします。 [OPTIMIZE](../../sql-reference/statements/misc.md#misc_operations-optimize) クエリがマージを実行しませんでした。

--- a/docs/ru/operations/settings/settings.md
+++ b/docs/ru/operations/settings/settings.md
@@ -1025,7 +1025,7 @@ ClickHouse генерирует исключение

 Значение по умолчанию: 0.

-## optimize_skip_unused_shards {#optimize-skip-unused-shards}
+## optimize\_skip\_unused\_shards {#optimize-skip-unused-shards}

 Включает или отключает пропуск неиспользуемых шардов для запросов [SELECT](../../sql-reference/statements/select/index.md) , в которых условие ключа шардирования задано в секции `WHERE/PREWHERE`. Предполагается, что данные распределены с помощью ключа шардирования, в противном случае настройка ничего не делает.

@@ -1036,15 +1036,39 @@ ClickHouse генерирует исключение

 Значение по умолчанию: 0

-## force_optimize_skip_unused_shards {#force-optimize-skip-unused-shards}
+## optimize\_skip\_unused\_shards\_nesting {#optimize-skip-unused-shards-nesting}
+
+Контролирует настройку [`optimize_skip_unused_shards`](#optimize-skip-unused-shards) (поэтому все еще требует `optimize_skip_unused_shards`) в зависимости от вложенности распределенного запроса (когда у вас есть `Distributed` таблица которая смотрит на другую `Distributed` таблицу).
+
+Возможные значения:
+
+-    0 — Выключена, `optimize_skip_unused_shards` работает всегда.
+-    1 — Включает `optimize_skip_unused_shards` только для 1-ого уровня вложенности.
+-    2 — Включает `optimize_skip_unused_shards` для 1-ого и 2-ого уровня вложенности.
+
+Значение по умолчанию: 0
+
+## force\_optimize\_skip\_unused\_shards {#force-optimize-skip-unused-shards}

 Разрешает или запрещает выполнение запроса, если настройка [optimize_skip_unused_shards](#optimize-skip-unused-shards) включена, а пропуск неиспользуемых шардов невозможен. Если данная настройка включена и пропуск невозможен, ClickHouse генерирует исключение.

 Возможные значения:

-   0 — Выключена. ClickHouse не генерирует исключение.
-   1 — Включена. Выполнение запроса запрещается, только если у таблицы есть ключ шардирования.
-   2 — Включена. Выполнение запроса запрещается, даже если для таблицы не определен ключ шардирования.
+-    0 — Выключена, `force_optimize_skip_unused_shards` работает всегда.
+-    1 — Включает `force_optimize_skip_unused_shards` только для 1-ого уровня вложенности.
+-    2 — Включает `force_optimize_skip_unused_shards` для 1-ого и 2-ого уровня вложенности.
+
+Значение по умолчанию: 0
+
+## force\_optimize\_skip\_unused\_shards\_nesting {#settings-force_optimize_skip_unused_shards_nesting}
+
+Контролирует настройку [`force_optimize_skip_unused_shards`](#force-optimize-skip-unused-shards) (поэтому все еще требует `optimize_skip_unused_shards`) в зависимости от вложенности распределенного запроса (когда у вас есть `Distributed` таблица которая смотрит на другую `Distributed` таблицу).
+
+Возможные значения:
+
+-   0 - Disabled, `force_optimize_skip_unused_shards` works on all levels.
+-   1 — Enables `force_optimize_skip_unused_shards` only for the first level.
+-   2 — Enables `force_optimize_skip_unused_shards` up to the second level.

 Значение по умолчанию: 0


--- a/docs/ru/sql-reference/statements/select/index.md
+++ b/docs/ru/sql-reference/statements/select/index.md
@@ -13,7 +13,7 @@ SELECT [DISTINCT] expr_list
 [FROM [db.]table | (subquery) | table_function] [FINAL]
 [SAMPLE sample_coeff]
 [ARRAY JOIN ...]
-[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
+[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)
 [PREWHERE expr]
 [WHERE expr]
 [GROUP BY expr_list] [WITH TOTALS]

--- a/docs/tools/mdx_clickhouse.py
+++ b/docs/tools/mdx_clickhouse.py
@@ -26,6 +26,7 @@ MARKDOWN_EXTENSIONS = [
    'mdx_clickhouse',
    'admonition',
    'attr_list',
+    'def_list',
    'codehilite',
    'nl2br',
    'sane_lists',

--- a/docs/tools/translate/filter.py
+++ b/docs/tools/translate/filter.py
@@ -117,6 +117,7 @@ def translate_filter(key, value, _format, _):
            admonition_value = []
            remaining_para_value = []
            in_admonition = True
+            break_value = [pandocfilters.LineBreak(), pandocfilters.Str(' ' * 4)]
            for item in value:
                if in_admonition:
                    if item.get('t') == 'SoftBreak':
@@ -124,9 +125,11 @@ def translate_filter(key, value, _format, _):
                    else:
                        admonition_value.append(item)
                else:
-                    remaining_para_value.append(item)
+                    if item.get('t') == 'SoftBreak':
+                        remaining_para_value += break_value
+                    else:
+                        remaining_para_value.append(item)

-            break_value = [pandocfilters.LineBreak(), pandocfilters.Str(' ' * 4)]
            if admonition_value[-1].get('t') == 'Quoted':
                text = process_sentence(admonition_value[-1]['c'][-1])
                text[0]['c'] = '"' + text[0]['c']
@@ -136,7 +139,7 @@ def translate_filter(key, value, _format, _):
            else:
                text = admonition_value[-1].get('c')
                if text:
-                    text = translate(text[0].upper() + text[1:])
+                    text = translate.translate(text[0].upper() + text[1:])
                    admonition_value.append(pandocfilters.Space())
                    admonition_value.append(pandocfilters.Str(f'"{text}"'))


--- a/docs/tools/translate/translate.sh
+++ b/docs/tools/translate/translate.sh
@@ -16,7 +16,7 @@ source "${BASE_DIR}/venv/bin/activate"
 ${BASE_DIR}/split_meta.py "${INPUT_PATH}"

 pandoc "${INPUT_CONTENT}" --filter "${BASE_DIR}/filter.py" -o "${TEMP_FILE}" \
-    -f "markdown-space_in_atx_header" -t "markdown_strict+pipe_tables+markdown_attribute+all_symbols_escapable+backtick_code_blocks+autolink_bare_uris-link_attributes+markdown_attribute+mmd_link_attributes-raw_attribute+header_attributes-grid_tables" \
+    -f "markdown-space_in_atx_header" -t "markdown_strict+pipe_tables+markdown_attribute+all_symbols_escapable+backtick_code_blocks+autolink_bare_uris-link_attributes+markdown_attribute+mmd_link_attributes-raw_attribute+header_attributes-grid_tables+definition_lists" \
    --atx-headers --wrap=none --columns=99999 --tab-stop=4
 perl -pi -e 's/{\\#\\#/{##/g' "${TEMP_FILE}"
 perl -pi -e 's/\\#\\#}/##}/g' "${TEMP_FILE}"

--- a/docs/tools/website.py
+++ b/docs/tools/website.py
@@ -67,6 +67,13 @@ def adjust_markdown_html(content):
                summary.extract()
                details.insert(0, summary)

+    for dd in soup.find_all('dd'):
+        dd_class = dd.attrs.get('class')
+        if dd_class:
+            dd.attrs['class'] = dd_class + ['pl-3']
+        else:
+            dd.attrs['class'] = 'pl-3'
+
    for div in soup.find_all('div'):
        div_class = div.attrs.get('class')
        is_admonition = div_class and 'admonition' in div.attrs.get('class')

--- a/docs/tr/operations/settings/settings.md
+++ b/docs/tr/operations/settings/settings.md
@@ -1048,17 +1048,6 @@ Olası değerler:

 Varsayılan değer: 0

-## force\_optimize\_skip\_unused\_shards\_no\_nested {#settings-force_optimize_skip_unused_shards_no_nested}
-
-Sıfırlamak [`optimize_skip_unused_shards`](#settings-force_optimize_skip_unused_shards) iç içe geçmiş için `Distributed` Tablo
-
-Olası değerler:
-
-   1 — Enabled.
-   0 — Disabled.
-
-Varsayılan değer: 0.
-
 ## optimize\_throw\_if\_noop {#setting-optimize_throw_if_noop}

 Bir özel durum atmayı etkinleştirir veya devre dışı bırakır. [OPTIMIZE](../../sql-reference/statements/misc.md#misc_operations-optimize) sorgu birleştirme gerçekleştirmedi.

--- a/docs/zh/operations/settings/settings.md
+++ b/docs/zh/operations/settings/settings.md
@@ -1048,17 +1048,6 @@ ClickHouse生成异常

 默认值：0

-## force\_optimize\_skip\_unused\_shards\_no\_nested {#settings-force_optimize_skip_unused_shards_no_nested}
-
-重置 [`optimize_skip_unused_shards`](#settings-force_optimize_skip_unused_shards) 对于嵌套 `Distributed` 表
-
-可能的值:
-
-   1 — Enabled.
-   0 — Disabled.
-
-默认值：0。
-
 ## optimize\_throw\_if\_noop {#setting-optimize_throw_if_noop}

 启用或禁用抛出异常，如果 [OPTIMIZE](../../sql-reference/statements/misc.md#misc_operations-optimize) 查询未执行合并。

--- a/docs/zh/sql-reference/statements/select/index.md
+++ b/docs/zh/sql-reference/statements/select/index.md
@@ -19,7 +19,7 @@ SELECT [DISTINCT] expr_list
 [FROM [db.]table | (subquery) | table_function] [FINAL]
 [SAMPLE sample_coeff]
 [ARRAY JOIN ...]
-[GLOBAL] [ANY|ALL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER] JOIN (subquery)|table USING columns_list
+[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)
 [PREWHERE expr]
 [WHERE expr]
 [GROUP BY expr_list] [WITH TOTALS]

--- a/programs/client/Client.cpp
+++ b/programs/client/Client.cpp
@@ -1477,7 +1477,8 @@ private:
                }
                else
                {
-                    out_logs_buf = std::make_unique<WriteBufferFromFile>(server_logs_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_APPEND | O_CREAT);
+                    out_logs_buf = std::make_unique<WriteBufferFromFile>(
+                        server_logs_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_APPEND | O_CREAT);
                    wb = out_logs_buf.get();
                }
            }

--- a/programs/server/Server.cpp
+++ b/programs/server/Server.cpp
@@ -431,6 +431,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
    DateLUT::instance();
    LOG_TRACE(log, "Initialized DateLUT with time zone '{}'.", DateLUT::instance().getTimeZone());

+    /// Initialize global thread pool
+    GlobalThreadPool::initialize(config().getUInt("max_thread_pool_size", 10000));

    /// Storage with temporary data for processing of heavy queries.
    {

--- a/programs/server/config.xml
+++ b/programs/server/config.xml
@@ -136,6 +136,15 @@
      -->
    <max_server_memory_usage>0</max_server_memory_usage>

+    <!-- Maximum number of threads in the Global thread pool.
+    This will default to a maximum of 10000 threads if not specified.
+    This setting will be useful in scenarios where there are a large number
+    of distributed queries that are running concurrently but are idling most
+    of the time, in which case a higher number of threads might be required.
+    -->
+
+     <max_thread_pool_size>10000</max_thread_pool_size>
+
    <!-- On memory constrained environments you may have to set this to value larger than 1.
      -->
    <max_server_memory_usage_to_ram_ratio>0.9</max_server_memory_usage_to_ram_ratio>

--- a/src/Access/AccessType.h
+++ b/src/Access/AccessType.h
@@ -133,6 +133,7 @@ enum class AccessType
    M(SYSTEM_REPLICATED_SENDS, "SYSTEM STOP REPLICATED SENDS, SYSTEM START REPLICATED SENDS, STOP_REPLICATED_SENDS, START REPLICATED SENDS", TABLE, SYSTEM_SENDS) \
    M(SYSTEM_SENDS, "SYSTEM STOP SENDS, SYSTEM START SENDS, STOP SENDS, START SENDS", GROUP, SYSTEM) \
    M(SYSTEM_REPLICATION_QUEUES, "SYSTEM STOP REPLICATION QUEUES, SYSTEM START REPLICATION QUEUES, STOP_REPLICATION_QUEUES, START REPLICATION QUEUES", TABLE, SYSTEM) \
+    M(SYSTEM_DROP_REPLICA, "DROP REPLICA", TABLE, SYSTEM) \
    M(SYSTEM_SYNC_REPLICA, "SYNC REPLICA", TABLE, SYSTEM) \
    M(SYSTEM_RESTART_REPLICA, "RESTART REPLICA", TABLE, SYSTEM) \
    M(SYSTEM_FLUSH_DISTRIBUTED, "FLUSH DISTRIBUTED", TABLE, SYSTEM_FLUSH) \

--- a/src/Common/ThreadPool.cpp
+++ b/src/Common/ThreadPool.cpp
 #include <Common/ThreadPool.h>
 #include <Common/Exception.h>

+#include <cassert>
 #include <type_traits>

+#include <Poco/Util/Application.h>
+#include <Poco/Util/LayeredConfiguration.h>

 namespace DB
 {
@@ -261,9 +264,25 @@ void ThreadPoolImpl<Thread>::worker(typename std::list<Thread>::iterator thread_
 template class ThreadPoolImpl<std::thread>;
 template class ThreadPoolImpl<ThreadFromGlobalPool>;

+std::unique_ptr<GlobalThreadPool> GlobalThreadPool::the_instance;
+
+void GlobalThreadPool::initialize(size_t max_threads)
+{
+    assert(!the_instance);
+
+    the_instance.reset(new GlobalThreadPool(max_threads,
+        1000 /*max_free_threads*/, 10000 /*max_queue_size*/,
+        false /*shutdown_on_exception*/));
+}

 GlobalThreadPool & GlobalThreadPool::instance()
 {
-    static GlobalThreadPool ret;
-    return ret;
+    if (!the_instance)
+    {
+        // Allow implicit initialization. This is needed for old code that is
+        // impractical to redo now, especially Arcadia users and unit tests.
+        initialize();
+    }
+
+    return *the_instance;
 }
--- a/src/Common/ThreadPool.h
+++ b/src/Common/ThreadPool.h
@@ -128,8 +128,16 @@ using FreeThreadPool = ThreadPoolImpl<std::thread>;
  */
 class GlobalThreadPool : public FreeThreadPool, private boost::noncopyable
 {
+    static std::unique_ptr<GlobalThreadPool> the_instance;
+
+    GlobalThreadPool(size_t max_threads_, size_t max_free_threads_,
+            size_t queue_size_, const bool shutdown_on_exception_)
+        : FreeThreadPool(max_threads_, max_free_threads_, queue_size_,
+            shutdown_on_exception_)
+    {}
+
 public:
-    GlobalThreadPool() : FreeThreadPool(10000, 1000, 10000, false) {}
+    static void initialize(size_t max_threads = 10000);
    static GlobalThreadPool & instance();
 };


--- a/src/Common/intExp.h
+++ b/src/Common/intExp.h
@@ -20,14 +20,14 @@ inline NO_SANITIZE_UNDEFINED uint64_t intExp2(int x)
    return 1ULL << x;
 }

-inline uint64_t intExp10(int x)
+constexpr inline uint64_t intExp10(int x)
 {
    if (x < 0)
        return 0;
    if (x > 19)
        return std::numeric_limits<uint64_t>::max();

-    static const uint64_t table[20] =
+    constexpr uint64_t table[20] =
    {
        1ULL,                   10ULL,                   100ULL,
        1000ULL,                10000ULL,                100000ULL,
@@ -44,9 +44,10 @@ inline uint64_t intExp10(int x)
 namespace common
 {

-inline int exp10_i32(int x)
+constexpr inline int exp10_i32(int x)
 {
-    static const int values[] = {
+    constexpr int values[] =
+    {
        1,
        10,
        100,
@@ -61,74 +62,76 @@ inline int exp10_i32(int x)
    return values[x];
 }

-inline int64_t exp10_i64(int x)
+constexpr inline int64_t exp10_i64(int x)
 {
-    static const int64_t values[] = {
-        1ll,
-        10ll,
-        100ll,
-        1000ll,
-        10000ll,
-        100000ll,
-        1000000ll,
-        10000000ll,
-        100000000ll,
-        1000000000ll,
-        10000000000ll,
-        100000000000ll,
-        1000000000000ll,
-        10000000000000ll,
-        100000000000000ll,
-        1000000000000000ll,
-        10000000000000000ll,
-        100000000000000000ll,
-        1000000000000000000ll
+    constexpr int64_t values[] =
+    {
+        1LL,
+        10LL,
+        100LL,
+        1000LL,
+        10000LL,
+        100000LL,
+        1000000LL,
+        10000000LL,
+        100000000LL,
+        1000000000LL,
+        10000000000LL,
+        100000000000LL,
+        1000000000000LL,
+        10000000000000LL,
+        100000000000000LL,
+        1000000000000000LL,
+        10000000000000000LL,
+        100000000000000000LL,
+        1000000000000000000LL
    };
    return values[x];
 }

-inline __int128 exp10_i128(int x)
+constexpr inline __int128 exp10_i128(int x)
 {
-    static const __int128 values[] = {
-        static_cast<__int128>(1ll),
-        static_cast<__int128>(10ll),
-        static_cast<__int128>(100ll),
-        static_cast<__int128>(1000ll),
-        static_cast<__int128>(10000ll),
-        static_cast<__int128>(100000ll),
-        static_cast<__int128>(1000000ll),
-        static_cast<__int128>(10000000ll),
-        static_cast<__int128>(100000000ll),
-        static_cast<__int128>(1000000000ll),
-        static_cast<__int128>(10000000000ll),
-        static_cast<__int128>(100000000000ll),
-        static_cast<__int128>(1000000000000ll),
-        static_cast<__int128>(10000000000000ll),
-        static_cast<__int128>(100000000000000ll),
-        static_cast<__int128>(1000000000000000ll),
-        static_cast<__int128>(10000000000000000ll),
-        static_cast<__int128>(100000000000000000ll),
-        static_cast<__int128>(1000000000000000000ll),
-        static_cast<__int128>(1000000000000000000ll) * 10ll,
-        static_cast<__int128>(1000000000000000000ll) * 100ll,
-        static_cast<__int128>(1000000000000000000ll) * 1000ll,
-        static_cast<__int128>(1000000000000000000ll) * 10000ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000ll,
-        static_cast<__int128>(1000000000000000000ll) * 1000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 10000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 1000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 10000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 1000000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 10000000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 1000000000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 10000000000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000000000000000ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000000000000000ll * 10ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000000000000000ll * 100ll,
-        static_cast<__int128>(1000000000000000000ll) * 100000000000000000ll * 1000ll
+    constexpr __int128 values[] =
+    {
+        static_cast<__int128>(1LL),
+        static_cast<__int128>(10LL),
+        static_cast<__int128>(100LL),
+        static_cast<__int128>(1000LL),
+        static_cast<__int128>(10000LL),
+        static_cast<__int128>(100000LL),
+        static_cast<__int128>(1000000LL),
+        static_cast<__int128>(10000000LL),
+        static_cast<__int128>(100000000LL),
+        static_cast<__int128>(1000000000LL),
+        static_cast<__int128>(10000000000LL),
+        static_cast<__int128>(100000000000LL),
+        static_cast<__int128>(1000000000000LL),
+        static_cast<__int128>(10000000000000LL),
+        static_cast<__int128>(100000000000000LL),
+        static_cast<__int128>(1000000000000000LL),
+        static_cast<__int128>(10000000000000000LL),
+        static_cast<__int128>(100000000000000000LL),
+        static_cast<__int128>(1000000000000000000LL),
+        static_cast<__int128>(1000000000000000000LL) * 10LL,
+        static_cast<__int128>(1000000000000000000LL) * 100LL,
+        static_cast<__int128>(1000000000000000000LL) * 1000LL,
+        static_cast<__int128>(1000000000000000000LL) * 10000LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000LL,
+        static_cast<__int128>(1000000000000000000LL) * 1000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 10000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 1000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 10000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 1000000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 10000000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 1000000000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 10000000000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000000000000000LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000000000000000LL * 10LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000000000000000LL * 100LL,
+        static_cast<__int128>(1000000000000000000LL) * 100000000000000000LL * 1000LL
    };
    return values[x];
 }
@@ -138,7 +141,7 @@ inline __int128 exp10_i128(int x)

 /// intExp10 returning the type T.
 template <typename T>
-inline T intExp10OfSize(int x)
+constexpr inline T intExp10OfSize(int x)
 {
    if constexpr (sizeof(T) <= 8)
        return intExp10(x);

--- a/src/Core/Settings.h
+++ b/src/Core/Settings.h
@@ -121,10 +121,11 @@ struct Settings : public SettingsCollection<Settings>
    \
    M(SettingBool, distributed_group_by_no_merge, false, "Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.", 0) \
    M(SettingBool, parallel_distributed_insert_select, false, "If true, distributed insert select query in the same cluster will be processed on local tables on every shard", 0) \
-    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.", 0) \
    M(SettingBool, optimize_distributed_group_by_sharding_key, false, "Optimize GROUP BY sharding_key queries (by avodiing costly aggregation on the initiator server).", 0) \
+    M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.", 0) \
    M(SettingUInt64, force_optimize_skip_unused_shards, 0, "Throw an exception if unused shards cannot be skipped (1 - throw only if the table has the sharding key, 2 - always throw.", 0) \
-    M(SettingBool, force_optimize_skip_unused_shards_no_nested, false, "Do not apply force_optimize_skip_unused_shards for nested Distributed tables.", 0) \
+    M(SettingUInt64, optimize_skip_unused_shards_nesting, 0, "Same as optimize_skip_unused_shards, but accept nesting level until which it will work.", 0) \
+    M(SettingUInt64, force_optimize_skip_unused_shards_nesting, 0, "Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work.", 0) \
    \
    M(SettingBool, input_format_parallel_parsing, true, "Enable parallel parsing for some data formats.", 0) \
    M(SettingUInt64, min_chunk_bytes_for_parallel_parsing, (10 * 1024 * 1024), "The minimum chunk size in bytes, which each thread will parse in parallel.", 0) \
@@ -397,6 +398,7 @@ struct Settings : public SettingsCollection<Settings>
    M(SettingBool, partial_merge_join, false, "Obsolete. Use join_algorithm='prefer_partial_merge' instead.", 0) \
    M(SettingUInt64, max_memory_usage_for_all_queries, 0, "Obsolete. Will be removed after 2020-10-20", 0) \
    \
+    M(SettingBool, force_optimize_skip_unused_shards_no_nested, false, "Obsolete setting, does nothing. Will be removed after 2020-12-01. Use force_optimize_skip_unused_shards_nesting instead.", 0) \
    M(SettingBool, experimental_use_processors, true, "Obsolete setting, does nothing. Will be removed after 2020-11-29.", 0)

 #define FORMAT_FACTORY_SETTINGS(M)                                            \

--- a/src/DataTypes/DataTypeDateTime64.cpp
+++ b/src/DataTypes/DataTypeDateTime64.cpp
@@ -20,19 +20,34 @@
 #include <optional>
 #include <string>

+
 namespace DB
 {

+namespace ErrorCodes
+{
+    extern const int ARGUMENT_OUT_OF_BOUND;
+}
+
+static constexpr UInt32 max_scale = 9;
+
 DataTypeDateTime64::DataTypeDateTime64(UInt32 scale_, const std::string & time_zone_name)
    : DataTypeDecimalBase<DateTime64>(DecimalUtils::maxPrecision<DateTime64>(), scale_),
      TimezoneMixin(time_zone_name)
 {
+    if (scale > max_scale)
+        throw Exception("Scale " + std::to_string(scale) + " is too large for DateTime64. Maximum is up to nanoseconds (9).",
+            ErrorCodes::ARGUMENT_OUT_OF_BOUND);
 }

 DataTypeDateTime64::DataTypeDateTime64(UInt32 scale_, const TimezoneMixin & time_zone_info)
-    : DataTypeDecimalBase<DateTime64>(DecimalUtils::maxPrecision<DateTime64>() - scale_, scale_),
+    : DataTypeDecimalBase<DateTime64>(DecimalUtils::maxPrecision<DateTime64>(), scale_),
      TimezoneMixin(time_zone_info)
-{}
+{
+    if (scale > max_scale)
+        throw Exception("Scale " + std::to_string(scale) + " is too large for DateTime64. Maximum is up to nanoseconds (9).",
+            ErrorCodes::ARGUMENT_OUT_OF_BOUND);
+}

 std::string DataTypeDateTime64::doGetName() const
 {

--- a/src/DataTypes/DataTypeDecimalBase.h
+++ b/src/DataTypes/DataTypeDecimalBase.h
@@ -72,7 +72,7 @@ public:
    {
        if (unlikely(precision < 1 || precision > maxPrecision()))
            throw Exception("Precision " + std::to_string(precision) + " is out of bounds", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
-        if (unlikely(scale < 0 || static_cast<UInt32>(scale) > maxPrecision()))
+        if (unlikely(scale > maxPrecision()))
            throw Exception("Scale " + std::to_string(scale) + " is out of bounds", ErrorCodes::ARGUMENT_OUT_OF_BOUND);
    }


--- a/src/DataTypes/getLeastSupertype.cpp
+++ b/src/DataTypes/getLeastSupertype.cpp
@@ -208,7 +208,7 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
        }
    }

-    /// For Date and DateTime, the common type is DateTime. No other types are compatible.
+    /// For Date and DateTime/DateTime64, the common type is DateTime/DateTime64. No other types are compatible.
    {
        UInt32 have_date = type_ids.count(TypeIndex::Date);
        UInt32 have_datetime = type_ids.count(TypeIndex::DateTime);
@@ -218,40 +218,25 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
        {
            bool all_date_or_datetime = type_ids.size() == (have_date + have_datetime + have_datetime64);
            if (!all_date_or_datetime)
-                throw Exception(getExceptionMessagePrefix(types) + " because some of them are Date/DateTime and some of them are not", ErrorCodes::NO_COMMON_TYPE);
+                throw Exception(getExceptionMessagePrefix(types) + " because some of them are Date/DateTime/DateTime64 and some of them are not",
+                    ErrorCodes::NO_COMMON_TYPE);

            if (have_datetime64 == 0)
-            {
                return std::make_shared<DataTypeDateTime>();
-            }

-            // When DateTime64 involved, make sure that supertype has whole-part precision
-            // big enough to hold max whole-value of any type from `types`.
-            // That would sacrifice scale when comparing DateTime64 of different scales.
+            UInt8 max_scale = 0;

-            UInt32 max_datetime64_whole_precision = 0;
            for (const auto & t : types)
            {
                if (const auto * dt64 = typeid_cast<const DataTypeDateTime64 *>(t.get()))
                {
-                    const auto whole_precision = dt64->getPrecision() - dt64->getScale();
-                    max_datetime64_whole_precision = std::max(whole_precision, max_datetime64_whole_precision);
+                    const auto scale = dt64->getScale();
+                    if (scale > max_scale)
+                        max_scale = scale;
                }
            }

-            UInt32 least_decimal_precision = 0;
-            if (have_datetime)
-            {
-                least_decimal_precision = leastDecimalPrecisionFor(TypeIndex::UInt32);
-            }
-            else if (have_date)
-            {
-                least_decimal_precision = leastDecimalPrecisionFor(TypeIndex::UInt16);
-            }
-            max_datetime64_whole_precision = std::max(least_decimal_precision, max_datetime64_whole_precision);
-
-            const UInt32 scale = DataTypeDateTime64::maxPrecision() - max_datetime64_whole_precision;
-            return std::make_shared<DataTypeDateTime64>(scale);
+            return std::make_shared<DataTypeDateTime64>(max_scale);
        }
    }


--- a/src/DataTypes/tests/gtest_data_type_get_common_type.cpp
+++ b/src/DataTypes/tests/gtest_data_type_get_common_type.cpp
@@ -86,7 +86,7 @@ TEST_P(LeastSuperTypeTest, getLeastSupertype)

 class MostSubtypeTest : public TypeTest {};

-TEST_P(MostSubtypeTest, getLeastSupertype)
+TEST_P(MostSubtypeTest, getMostSubtype)
 {
    if (this->expected_type)
    {
@@ -124,9 +124,7 @@ INSTANTIATE_TEST_SUITE_P(data_type,
            {"Date DateTime64(3)", "DateTime64(3)"},
            {"DateTime DateTime64(3)", "DateTime64(3)"},
            {"DateTime DateTime64(0)", "DateTime64(0)"},
-            {"DateTime64(9) DateTime64(3)", "DateTime64(3)"},
-            {"DateTime DateTime64(12)", "DateTime64(8)"},
-            {"Date DateTime64(15)", "DateTime64(13)"},
+            {"DateTime64(9) DateTime64(3)", "DateTime64(9)"},

            {"String FixedString(32) FixedString(8)", "String"},


--- a/src/IO/ReadHelpers.h
+++ b/src/IO/ReadHelpers.h
@@ -275,8 +275,11 @@ ReturnType readIntTextImpl(T & x, ReadBuffer & buf)
        switch (*buf.position())
        {
            case '+':
+            {
                break;
+            }
            case '-':
+            {
                if constexpr (is_signed_v<T>)
                    negative = true;
                else
@@ -287,6 +290,7 @@ ReturnType readIntTextImpl(T & x, ReadBuffer & buf)
                        return ReturnType(false);
                }
                break;
+            }
            case '0': [[fallthrough]];
            case '1': [[fallthrough]];
            case '2': [[fallthrough]];
@@ -297,20 +301,27 @@ ReturnType readIntTextImpl(T & x, ReadBuffer & buf)
            case '7': [[fallthrough]];
            case '8': [[fallthrough]];
            case '9':
+            {
                if constexpr (check_overflow == ReadIntTextCheckOverflow::CHECK_OVERFLOW)
                {
-                    // perform relativelly slow overflow check only when number of decimal digits so far is close to the max for given type.
-                    if (buf.count() - initial_pos >= std::numeric_limits<T>::max_digits10)
+                    /// Perform relativelly slow overflow check only when
+                    /// number of decimal digits so far is close to the max for given type.
+                    /// Example: 20 * 10 will overflow Int8.
+
+                    if (buf.count() - initial_pos + 1 >= std::numeric_limits<T>::max_digits10)
                    {
-                        if (common::mulOverflow(res, static_cast<decltype(res)>(10), res)
-                            || common::addOverflow(res, static_cast<decltype(res)>(*buf.position() - '0'), res))
+                        T signed_res = res;
+                        if (common::mulOverflow<T>(signed_res, 10, signed_res)
+                            || common::addOverflow<T>(signed_res, (*buf.position() - '0'), signed_res))
                            return ReturnType(false);
+                        res = signed_res;
                        break;
                    }
                }
                res *= 10;
                res += *buf.position() - '0';
                break;
+            }
            default:
                goto end;
        }
@@ -318,7 +329,23 @@ ReturnType readIntTextImpl(T & x, ReadBuffer & buf)
    }

 end:
-    x = negative ? -res : res;
+    if (!negative)
+    {
+        x = res;
+    }
+    else
+    {
+        if constexpr (check_overflow == ReadIntTextCheckOverflow::CHECK_OVERFLOW)
+        {
+            x = res;
+            if (common::mulOverflow<T>(x, -1, x))
+                return ReturnType(false);
+        }
+        else
+        {
+            x = -res;
+        }
+    }

    return ReturnType(true);
 }
@@ -658,35 +685,34 @@ inline ReturnType readDateTimeTextImpl(DateTime64 & datetime64, UInt32 scale, Re
        return ReturnType(false);
    }

-    DB::DecimalUtils::DecimalComponents<DateTime64::NativeType> c{static_cast<DateTime64::NativeType>(whole), 0};
+    DB::DecimalUtils::DecimalComponents<DateTime64::NativeType> components{static_cast<DateTime64::NativeType>(whole), 0};

    if (!buf.eof() && *buf.position() == '.')
    {
-        buf.ignore(1); // skip separator
-        const auto pos_before_fractional = buf.count();
-        if (!tryReadIntText<ReadIntTextCheckOverflow::CHECK_OVERFLOW>(c.fractional, buf))
-        {
-            return ReturnType(false);
-        }
-
-        // Adjust fractional part to the scale, since decimalFromComponents knows nothing
-        // about convention of ommiting trailing zero on fractional part
-        // and assumes that fractional part value is less than 10^scale.
+        ++buf.position();

-        // If scale is 3, but we read '12', promote fractional part to '120'.
-        // And vice versa: if we read '1234', denote it to '123'.
-        const auto fractional_length = static_cast<Int32>(buf.count() - pos_before_fractional);
-        if (const auto adjust_scale = static_cast<Int32>(scale) - fractional_length; adjust_scale > 0)
+        /// Read digits, up to 'scale' positions.
+        for (size_t i = 0; i < scale; ++i)
        {
-            c.fractional *= common::exp10_i64(adjust_scale);
-        }
-        else if (adjust_scale < 0)
-        {
-            c.fractional /= common::exp10_i64(-1 * adjust_scale);
+            if (!buf.eof() && isNumericASCII(*buf.position()))
+            {
+                components.fractional *= 10;
+                components.fractional += *buf.position() - '0';
+                ++buf.position();
+            }
+            else
+            {
+                /// Adjust to scale.
+                components.fractional *= 10;
+            }
        }
+
+        /// Ignore digits that are out of precision.
+        while (!buf.eof() && isNumericASCII(*buf.position()))
+            ++buf.position();
    }

-    datetime64 = DecimalUtils::decimalFromComponents<DateTime64>(c, scale);
+    datetime64 = DecimalUtils::decimalFromComponents<DateTime64>(components, scale);

    return ReturnType(true);
 }

--- a/src/Interpreters/ClusterProxy/executeQuery.cpp
+++ b/src/Interpreters/ClusterProxy/executeQuery.cpp
@@ -15,7 +15,7 @@ namespace DB
 namespace ClusterProxy
 {

-Context removeUserRestrictionsFromSettings(const Context & context, const Settings & settings)
+Context removeUserRestrictionsFromSettings(const Context & context, const Settings & settings, Poco::Logger * log)
 {
    Settings new_settings = settings;
    new_settings.queue_max_wait_ms = Cluster::saturate(new_settings.queue_max_wait_ms, settings.max_execution_time);
@@ -28,10 +28,44 @@ Context removeUserRestrictionsFromSettings(const Context & context, const Settin
    new_settings.max_concurrent_queries_for_user.changed = false;
    new_settings.max_memory_usage_for_user.changed = false;

-    if (settings.force_optimize_skip_unused_shards_no_nested)
+    if (settings.force_optimize_skip_unused_shards_nesting)
    {
-        new_settings.force_optimize_skip_unused_shards = 0;
-        new_settings.force_optimize_skip_unused_shards.changed = false;
+        if (new_settings.force_optimize_skip_unused_shards_nesting == 1)
+        {
+            new_settings.force_optimize_skip_unused_shards = false;
+            new_settings.force_optimize_skip_unused_shards.changed = false;
+
+            if (log)
+                LOG_TRACE(log, "Disabling force_optimize_skip_unused_shards for nested queries (force_optimize_skip_unused_shards_nesting exceeded)");
+        }
+        else
+        {
+            --new_settings.force_optimize_skip_unused_shards_nesting.value;
+            new_settings.force_optimize_skip_unused_shards_nesting.changed = true;
+
+            if (log)
+                LOG_TRACE(log, "force_optimize_skip_unused_shards_nesting is now {}", new_settings.force_optimize_skip_unused_shards_nesting);
+        }
+    }
+
+    if (settings.optimize_skip_unused_shards_nesting)
+    {
+        if (new_settings.optimize_skip_unused_shards_nesting == 1)
+        {
+            new_settings.optimize_skip_unused_shards = false;
+            new_settings.optimize_skip_unused_shards.changed = false;
+
+            if (log)
+                LOG_TRACE(log, "Disabling optimize_skip_unused_shards for nested queries (optimize_skip_unused_shards_nesting exceeded)");
+        }
+        else
+        {
+            --new_settings.optimize_skip_unused_shards_nesting.value;
+            new_settings.optimize_skip_unused_shards_nesting.changed = true;
+
+            if (log)
+                LOG_TRACE(log, "optimize_skip_unused_shards_nesting is now {}", new_settings.optimize_skip_unused_shards_nesting);
+        }
    }

    Context new_context(context);
@@ -41,14 +75,16 @@ Context removeUserRestrictionsFromSettings(const Context & context, const Settin
 }

 Pipes executeQuery(
-    IStreamFactory & stream_factory, const ClusterPtr & cluster,
+    IStreamFactory & stream_factory, const ClusterPtr & cluster, Poco::Logger * log,
    const ASTPtr & query_ast, const Context & context, const Settings & settings, const SelectQueryInfo & query_info)
 {
+    assert(log);
+
    Pipes res;

    const std::string query = queryToString(query_ast);

-    Context new_context = removeUserRestrictionsFromSettings(context, settings);
+    Context new_context = removeUserRestrictionsFromSettings(context, settings, log);

    ThrottlerPtr user_level_throttler;
    if (auto * process_list_element = context.getProcessListElement())

--- a/src/Interpreters/ClusterProxy/executeQuery.h
+++ b/src/Interpreters/ClusterProxy/executeQuery.h
@@ -21,13 +21,13 @@ class IStreamFactory;

 /// removes different restrictions (like max_concurrent_queries_for_user, max_memory_usage_for_user, etc.)
 /// from settings and creates new context with them
-Context removeUserRestrictionsFromSettings(const Context & context, const Settings & settings);
+Context removeUserRestrictionsFromSettings(const Context & context, const Settings & settings, Poco::Logger * log = nullptr);

 /// Execute a distributed query, creating a vector of BlockInputStreams, from which the result can be read.
 /// `stream_factory` object encapsulates the logic of creating streams for a different type of query
 /// (currently SELECT, DESCRIBE).
 Pipes executeQuery(
-    IStreamFactory & stream_factory, const ClusterPtr & cluster,
+    IStreamFactory & stream_factory, const ClusterPtr & cluster, Poco::Logger * log,
    const ASTPtr & query_ast, const Context & context, const Settings & settings, const SelectQueryInfo & query_info);

 }

--- a/src/Interpreters/ExpressionAnalyzer.cpp
+++ b/src/Interpreters/ExpressionAnalyzer.cpp
@@ -760,10 +760,6 @@ bool SelectQueryExpressionAnalyzer::appendGroupBy(ExpressionActionsChain & chain
            group_by_elements_actions.emplace_back(std::make_shared<ExpressionActions>(all_columns, context));
            getRootActions(child, only_types, group_by_elements_actions.back());
        }
-//        std::cerr << "group_by_elements_actions\n";
-//        for (const auto & elem : group_by_elements_actions) {
-//            std::cerr << elem->dumpActions() << "\n";
-//        }
    }

    return true;
@@ -857,10 +853,6 @@ bool SelectQueryExpressionAnalyzer::appendOrderBy(ExpressionActionsChain & chain
            order_by_elements_actions.emplace_back(std::make_shared<ExpressionActions>(all_columns, context));
            getRootActions(child, only_types, order_by_elements_actions.back());
        }
-//        std::cerr << "order_by_elements_actions\n";
-//        for (const auto & elem : order_by_elements_actions) {
-//            std::cerr << elem->dumpActions() << "\n";
-//        }
    }
    return true;
 }

--- a/src/Interpreters/InterpreterSystemQuery.cpp
+++ b/src/Interpreters/InterpreterSystemQuery.cpp
@@ -49,6 +49,7 @@ namespace ErrorCodes
    extern const int CANNOT_KILL;
    extern const int NOT_IMPLEMENTED;
    extern const int TIMEOUT_EXCEEDED;
+    extern const int TABLE_WAS_NOT_DROPPED;
 }


@@ -185,7 +186,7 @@ BlockIO InterpreterSystemQuery::execute()

    /// Make canonical query for simpler processing
    if (!query.table.empty())
-         table_id = context.resolveStorageID(StorageID(query.database, query.table), Context::ResolveOrdinary);
+        table_id = context.resolveStorageID(StorageID(query.database, query.table), Context::ResolveOrdinary);

    if (!query.target_dictionary.empty() && !query.database.empty())
        query.target_dictionary = query.database + "." + query.target_dictionary;
@@ -285,6 +286,9 @@ BlockIO InterpreterSystemQuery::execute()
        case Type::START_DISTRIBUTED_SENDS:
            startStopAction(ActionLocks::DistributedSend, true);
            break;
+        case Type::DROP_REPLICA:
+            dropReplica(query);
+            break;
        case Type::SYNC_REPLICA:
            syncReplica(query);
            break;
@@ -400,6 +404,111 @@ void InterpreterSystemQuery::restartReplicas(Context & system_context)
    pool.wait();
 }

+void InterpreterSystemQuery::dropReplica(ASTSystemQuery & query)
+{
+    if (query.replica.empty())
+        throw Exception("Replica name is empty", ErrorCodes::BAD_ARGUMENTS);
+
+    if (!table_id.empty())
+    {
+        context.checkAccess(AccessType::SYSTEM_DROP_REPLICA, table_id);
+        StoragePtr table = DatabaseCatalog::instance().getTable(table_id, context);
+
+        if (!dropReplicaImpl(query, table))
+            throw Exception("Table " + table_id.getNameForLogs() + " is not replicated", ErrorCodes::BAD_ARGUMENTS);
+    }
+    else if (!query.database.empty())
+    {
+        context.checkAccess(AccessType::SYSTEM_DROP_REPLICA, query.database);
+        DatabasePtr database = DatabaseCatalog::instance().getDatabase(query.database);
+        for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next())
+            dropReplicaImpl(query, iterator->table());
+        LOG_TRACE(log, "Dropped replica {} from database {}", query.replica, backQuoteIfNeed(database->getDatabaseName()));
+    }
+    else if (query.is_drop_whole_replica)
+    {
+        context.checkAccess(AccessType::SYSTEM_DROP_REPLICA);
+        auto databases = DatabaseCatalog::instance().getDatabases();
+
+        for (auto & elem : databases)
+        {
+            DatabasePtr & database = elem.second;
+            for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next())
+                dropReplicaImpl(query, iterator->table());
+            LOG_TRACE(log, "Dropped replica {} from database {}", query.replica, backQuoteIfNeed(database->getDatabaseName()));
+        }
+    }
+    else if (!query.replica_zk_path.empty())
+    {
+        context.checkAccess(AccessType::SYSTEM_DROP_REPLICA);
+        auto remote_replica_path = query.replica_zk_path  + "/replicas/" + query.replica;
+
+        /// This check is actually redundant, but it may prevent from some user mistakes
+        for (auto & elem : DatabaseCatalog::instance().getDatabases())
+        {
+            DatabasePtr & database = elem.second;
+            for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next())
+            {
+                if (auto * storage_replicated = dynamic_cast<StorageReplicatedMergeTree *>(iterator->table().get()))
+                {
+                    StorageReplicatedMergeTree::Status status;
+                    storage_replicated->getStatus(status);
+                    if (status.zookeeper_path == query.replica_zk_path)
+                        throw Exception("There is a local table " + storage_replicated->getStorageID().getNameForLogs() +
+                                        ", which has the same table path in ZooKeeper. Please check the path in query. "
+                                        "If you want to drop replica of this table, use `DROP TABLE` "
+                                        "or `SYSTEM DROP REPLICA 'name' FROM db.table`", ErrorCodes::TABLE_WAS_NOT_DROPPED);
+                }
+            }
+        }
+
+        auto zookeeper = context.getZooKeeper();
+
+        bool looks_like_table_path = zookeeper->exists(query.replica_zk_path + "/replicas") ||
+                                     zookeeper->exists(query.replica_zk_path + "/dropped");
+        if (!looks_like_table_path)
+            throw Exception("Specified path " + query.replica_zk_path + " does not look like a table path",
+                            ErrorCodes::TABLE_WAS_NOT_DROPPED);
+
+        if (zookeeper->exists(remote_replica_path + "/is_active"))
+            throw Exception("Can't remove replica: " + query.replica + ", because it's active",
+                ErrorCodes::TABLE_WAS_NOT_DROPPED);
+
+        StorageReplicatedMergeTree::dropReplica(zookeeper, query.replica_zk_path, query.replica, log);
+        LOG_INFO(log, "Dropped replica {}", remote_replica_path);
+    }
+    else
+        throw Exception("Invalid query", ErrorCodes::LOGICAL_ERROR);
+}
+
+bool InterpreterSystemQuery::dropReplicaImpl(ASTSystemQuery & query, const StoragePtr & table)
+{
+    auto * storage_replicated = dynamic_cast<StorageReplicatedMergeTree *>(table.get());
+    if (!storage_replicated)
+        return false;
+
+    StorageReplicatedMergeTree::Status status;
+    auto zookeeper = context.getZooKeeper();
+    storage_replicated->getStatus(status);
+
+    /// Do not allow to drop local replicas and active remote replicas
+    if (query.replica == status.replica_name)
+        throw Exception("We can't drop local replica, please use `DROP TABLE` "
+                        "if you want to clean the data and drop this replica", ErrorCodes::TABLE_WAS_NOT_DROPPED);
+
+    /// NOTE it's not atomic: replica may become active after this check, but before dropReplica(...)
+    /// However, the main usecase is to drop dead replica, which cannot become active.
+    /// This check prevents only from accidental drop of some other replica.
+    if (zookeeper->exists(status.zookeeper_path + "/replicas/" + query.replica + "/is_active"))
+        throw Exception("Can't drop replica: " + query.replica + ", because it's active",
+                        ErrorCodes::TABLE_WAS_NOT_DROPPED);
+
+    storage_replicated->dropReplica(zookeeper, status.zookeeper_path, query.replica, log);
+    LOG_TRACE(log, "Dropped replica {} of {}", query.replica, table->getStorageID().getNameForLogs());
+
+    return true;
+}
+
 void InterpreterSystemQuery::syncReplica(ASTSystemQuery &)
 {
    context.checkAccess(AccessType::SYSTEM_SYNC_REPLICA, table_id);
@@ -530,6 +639,11 @@ AccessRightsElements InterpreterSystemQuery::getRequiredAccessForDDLOnCluster()
                required_access.emplace_back(AccessType::SYSTEM_REPLICATION_QUEUES, query.database, query.table);
            break;
        }
+        case Type::DROP_REPLICA:
+        {
+            required_access.emplace_back(AccessType::SYSTEM_DROP_REPLICA, query.database, query.table);
+            break;
+        }
        case Type::SYNC_REPLICA:
        {
            required_access.emplace_back(AccessType::SYSTEM_SYNC_REPLICA, query.database, query.table);

--- a/src/Interpreters/InterpreterSystemQuery.h
+++ b/src/Interpreters/InterpreterSystemQuery.h
@@ -51,6 +51,8 @@ private:

    void restartReplicas(Context & system_context);
    void syncReplica(ASTSystemQuery & query);
+    void dropReplica(ASTSystemQuery & query);
+    bool dropReplicaImpl(ASTSystemQuery & query, const StoragePtr & table);
    void flushDistributed(ASTSystemQuery & query);

    AccessRightsElements getRequiredAccessForDDLOnCluster() const;

--- a/src/Parsers/ASTSystemQuery.cpp
+++ b/src/Parsers/ASTSystemQuery.cpp
@@ -9,7 +9,7 @@ namespace DB

 namespace ErrorCodes
 {
-    extern const int BAD_TYPE_OF_FIELD;
+    extern const int LOGICAL_ERROR;
 }


@@ -39,6 +39,8 @@ const char * ASTSystemQuery::typeToString(Type type)
            return "RESTART REPLICAS";
        case Type::RESTART_REPLICA:
            return "RESTART REPLICA";
+        case Type::DROP_REPLICA:
+            return "DROP REPLICA";
        case Type::SYNC_REPLICA:
            return "SYNC REPLICA";
        case Type::FLUSH_DISTRIBUTED:
@@ -82,15 +84,15 @@ const char * ASTSystemQuery::typeToString(Type type)
        case Type::FLUSH_LOGS:
            return "FLUSH LOGS";
        default:
-            throw Exception("Unknown SYSTEM query command", ErrorCodes::BAD_TYPE_OF_FIELD);
+            throw Exception("Unknown SYSTEM query command", ErrorCodes::LOGICAL_ERROR);
    }
 }


 void ASTSystemQuery::formatImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const
 {
-    settings.ostr << (settings.hilite ? hilite_keyword : "") << "SYSTEM " << (settings.hilite ? hilite_none : "");
-    settings.ostr << typeToString(type);
+    settings.ostr << (settings.hilite ? hilite_keyword : "") << "SYSTEM ";
+    settings.ostr << typeToString(type) << (settings.hilite ? hilite_none : "");

    auto print_database_table = [&]
    {
@@ -116,6 +118,28 @@ void ASTSystemQuery::formatImpl(const FormatSettings & settings, FormatState &,
                      << (settings.hilite ? hilite_none : "");
    };

+    auto print_drop_replica = [&] {
+        settings.ostr << " " << quoteString(replica);
+        if (!table.empty())
+        {
+            settings.ostr << (settings.hilite ? hilite_keyword : "") << " FROM TABLE"
+                          << (settings.hilite ? hilite_none : "");
+            print_database_table();
+        }
+        else if (!replica_zk_path.empty())
+        {
+            settings.ostr << (settings.hilite ? hilite_keyword : "") << " FROM ZKPATH "
+                          << (settings.hilite ? hilite_none : "") << quoteString(replica_zk_path);
+        }
+        else if (!database.empty())
+        {
+            settings.ostr << (settings.hilite ? hilite_keyword : "") << " FROM DATABASE "
+                          << (settings.hilite ? hilite_none : "");
+            settings.ostr << (settings.hilite ? hilite_identifier : "") << backQuoteIfNeed(database)
+                          << (settings.hilite ? hilite_none : "");
+        }
+    };
+
    if (!cluster.empty())
        formatOnCluster(settings);

@@ -143,6 +167,8 @@ void ASTSystemQuery::formatImpl(const FormatSettings & settings, FormatState &,
    }
    else if (type == Type::RELOAD_DICTIONARY)
        print_database_dictionary();
+    else if (type == Type::DROP_REPLICA)
+        print_drop_replica();
 }



--- a/src/Parsers/ASTSystemQuery.h
+++ b/src/Parsers/ASTSystemQuery.h
@@ -30,6 +30,7 @@ public:
        START_LISTEN_QUERIES,
        RESTART_REPLICAS,
        RESTART_REPLICA,
+        DROP_REPLICA,
        SYNC_REPLICA,
        RELOAD_DICTIONARY,
        RELOAD_DICTIONARIES,
@@ -61,6 +62,9 @@ public:
    String target_dictionary;
    String database;
    String table;
+    String replica;
+    String replica_zk_path;
+    bool is_drop_whole_replica;

    String getID(char) const override { return "SYSTEM query"; }


--- a/src/Parsers/ParserSystemQuery.cpp
+++ b/src/Parsers/ParserSystemQuery.cpp
@@ -2,6 +2,7 @@
 #include <Parsers/ASTSystemQuery.h>
 #include <Parsers/CommonParsers.h>
 #include <Parsers/ExpressionElementParsers.h>
+#include <Parsers/ASTIdentifier.h>
 #include <Parsers/ASTLiteral.h>
 #include <Parsers/parseDatabaseAndTableName.h>

@@ -57,6 +58,48 @@ bool ParserSystemQuery::parseImpl(IParser::Pos & pos, ASTPtr & node, Expected &
            break;
        }

+        case Type::DROP_REPLICA:
+        {
+            ASTPtr ast;
+            if (!ParserStringLiteral{}.parse(pos, ast, expected))
+                return false;
+            res->replica = ast->as<ASTLiteral &>().value.safeGet<String>();
+            if (ParserKeyword{"FROM"}.ignore(pos, expected))
+            {
+                // way 1. parse replica database
+                // way 2. parse replica tables
+                // way 3. parse replica zkpath
+                if (ParserKeyword{"DATABASE"}.ignore(pos, expected))
+                {
+                    ParserIdentifier database_parser;
+                    ASTPtr database;
+                    if (!database_parser.parse(pos, database, expected))
+                        return false;
+                    tryGetIdentifierNameInto(database, res->database);
+                }
+                else if (ParserKeyword{"TABLE"}.ignore(pos, expected))
+                {
+                    parseDatabaseAndTableName(pos, expected, res->database, res->table);
+                }
+                else if (ParserKeyword{"ZKPATH"}.ignore(pos, expected))
+                {
+                    ASTPtr path_ast;
+                    if (!ParserStringLiteral{}.parse(pos, path_ast, expected))
+                        return false;
+                    String zk_path = path_ast->as<ASTLiteral &>().value.safeGet<String>();
+                    if (!zk_path.empty() && zk_path[zk_path.size() - 1] == '/')
+                        zk_path.pop_back();
+                    res->replica_zk_path = zk_path;
+                }
+                else
+                    return false;
+            }
+            else
+                res->is_drop_whole_replica = true;
+
+            break;
+        }
+
        case Type::RESTART_REPLICA:
        case Type::SYNC_REPLICA:
            if (!parseDatabaseAndTableName(pos, expected, res->database, res->table))

--- a/src/Storages/MergeTree/MergeTreeRangeReader.cpp
+++ b/src/Storages/MergeTree/MergeTreeRangeReader.cpp
@@ -914,6 +914,8 @@ void MergeTreeRangeReader::executePrewhereActionsAndFilterColumns(ReadResult & r
    else
    {
        result.columns[prewhere_column_pos] = result.getFilterHolder()->convertToFullColumnIfConst();
+        if (getSampleBlock().getByName(prewhere->prewhere_column_name).type->isNullable())
+            result.columns[prewhere_column_pos] = makeNullable(std::move(result.columns[prewhere_column_pos]));
        result.clearFilter(); // Acting as a flag to not filter in PREWHERE
    }
 }

--- a/src/Storages/StorageDistributed.cpp
+++ b/src/Storages/StorageDistributed.cpp
@@ -45,6 +45,7 @@
 #include <Interpreters/evaluateConstantExpression.h>
 #include <Interpreters/getClusterName.h>
 #include <Interpreters/getTableExpressions.h>
+#include <Functions/IFunction.h>

 #include <Core/Field.h>
 #include <Core/Settings.h>
@@ -188,6 +189,18 @@ ExpressionActionsPtr buildShardingKeyExpression(const ASTPtr & sharding_key, con
    return ExpressionAnalyzer(query, syntax_result, context).getActions(project);
 }

+bool isExpressionActionsDeterministics(const ExpressionActionsPtr & actions)
+{
+    for (const auto & action : actions->getActions())
+    {
+        if (action.type != ExpressionAction::APPLY_FUNCTION)
+            continue;
+        if (!action.function_base->isDeterministic())
+            return false;
+    }
+    return true;
+}
+
 class ReplacingConstantExpressionsMatcher
 {
 public:
@@ -299,6 +312,7 @@ StorageDistributed::StorageDistributed(
    {
        sharding_key_expr = buildShardingKeyExpression(sharding_key_, *global_context, storage_metadata.getColumns().getAllPhysical(), false);
        sharding_key_column_name = sharding_key_->getColumnName();
+        sharding_key_is_deterministic = isExpressionActionsDeterministics(sharding_key_expr);
    }

    if (!relative_data_path.empty())
@@ -514,8 +528,8 @@ Pipes StorageDistributed::read(
        : ClusterProxy::SelectStreamFactory(
            header, processed_stage, StorageID{remote_database, remote_table}, scalars, has_virtual_shard_num_column, context.getExternalTables());

-    return ClusterProxy::executeQuery(
-        select_stream_factory, cluster, modified_query_ast, context, context.getSettingsRef(), query_info);
+    return ClusterProxy::executeQuery(select_stream_factory, cluster, log,
+        modified_query_ast, context, context.getSettingsRef(), query_info);
 }


@@ -695,7 +709,7 @@ ClusterPtr StorageDistributed::getOptimizedCluster(const Context & context, cons
    ClusterPtr cluster = getCluster();
    const Settings & settings = context.getSettingsRef();

-    if (has_sharding_key)
+    if (has_sharding_key && sharding_key_is_deterministic)
    {
        ClusterPtr optimized = skipUnusedShards(cluster, query_ptr, metadata_snapshot, context);
        if (optimized)
@@ -708,6 +722,8 @@ ClusterPtr StorageDistributed::getOptimizedCluster(const Context & context, cons
        std::stringstream exception_message;
        if (!has_sharding_key)
            exception_message << "No sharding key";
+        else if (sharding_key_is_deterministic)
+            exception_message << "Sharding key is not deterministic";
        else
            exception_message << "Sharding key " << sharding_key_column_name << " is not used";


--- a/src/Storages/StorageDistributed.h
+++ b/src/Storages/StorageDistributed.h
@@ -143,6 +143,7 @@ public:
    const String cluster_name;

    bool has_sharding_key;
+    bool sharding_key_is_deterministic = false;
    ExpressionActionsPtr sharding_key_expr;
    String sharding_key_column_name;


--- a/src/Storages/StorageReplicatedMergeTree.cpp
+++ b/src/Storages/StorageReplicatedMergeTree.cpp
@@ -622,7 +622,8 @@ void StorageReplicatedMergeTree::createReplica(const StorageMetadataPtr & metada

 void StorageReplicatedMergeTree::drop()
 {
-    /// There is also the case when user has configured ClickHouse to wrong ZooKeeper cluster,
+    /// There is also the case when user has configured ClickHouse to wrong ZooKeeper cluster
+    /// or metadata of staled replica were removed manually,
    /// in this case, has_metadata_in_zookeeper = false, and we also permit to drop the table.

    if (has_metadata_in_zookeeper)
@@ -634,95 +635,99 @@ void StorageReplicatedMergeTree::drop()
            throw Exception("Can't drop readonly replicated table (need to drop data in ZooKeeper as well)", ErrorCodes::TABLE_IS_READ_ONLY);

        shutdown();
+        dropReplica(zookeeper, zookeeper_path, replica_name, log);
+    }

-        if (zookeeper->expired())
-            throw Exception("Table was not dropped because ZooKeeper session has expired.", ErrorCodes::TABLE_WAS_NOT_DROPPED);
+    dropAllData();
+}

-        LOG_INFO(log, "Removing replica {}", replica_path);
-        replica_is_active_node = nullptr;
-        /// It may left some garbage if replica_path subtree are concurently modified
-        zookeeper->tryRemoveRecursive(replica_path);
-        if (zookeeper->exists(replica_path))
-            LOG_ERROR(log, "Replica was not completely removed from ZooKeeper, {} still exists and may contain some garbage.", replica_path);
+void StorageReplicatedMergeTree::dropReplica(zkutil::ZooKeeperPtr zookeeper, const String & zookeeper_path, const String & replica, Poco::Logger * logger)
+{
+    if (zookeeper->expired())
+        throw Exception("Table was not dropped because ZooKeeper session has expired.", ErrorCodes::TABLE_WAS_NOT_DROPPED);

-        /// Check that `zookeeper_path` exists: it could have been deleted by another replica after execution of previous line.
-        Strings replicas;
-        if (Coordination::Error::ZOK == zookeeper->tryGetChildren(zookeeper_path + "/replicas", replicas) && replicas.empty())
-        {
-            LOG_INFO(log, "{} is the last replica, will remove table", replica_path);
+    auto remote_replica_path = zookeeper_path + "/replicas/" + replica;
+    LOG_INFO(logger, "Removing replica {}", remote_replica_path);
+    /// It may left some garbage if replica_path subtree are concurrently modified
+    zookeeper->tryRemoveRecursive(remote_replica_path);
+    if (zookeeper->exists(remote_replica_path))
+        LOG_ERROR(logger, "Replica was not completely removed from ZooKeeper, {} still exists and may contain some garbage.", remote_replica_path);

-            /** At this moment, another replica can be created and we cannot remove the table.
-              * Try to remove /replicas node first. If we successfully removed it,
-              * it guarantees that we are the only replica that proceed to remove the table
-              * and no new replicas can be created after that moment (it requires the existence of /replicas node).
-              * and table cannot be recreated with new /replicas node on another servers while we are removing data,
-              * because table creation is executed in single transaction that will conflict with remaining nodes.
-              */
+    /// Check that `zookeeper_path` exists: it could have been deleted by another replica after execution of previous line.
+    Strings replicas;
+    if (Coordination::Error::ZOK != zookeeper->tryGetChildren(zookeeper_path + "/replicas", replicas) || !replicas.empty())
+        return;

-            Coordination::Requests ops;
-            Coordination::Responses responses;
-            ops.emplace_back(zkutil::makeRemoveRequest(zookeeper_path + "/replicas", -1));
-            ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/dropped", "", zkutil::CreateMode::Persistent));
-            Coordination::Error code = zookeeper->tryMulti(ops, responses);
+    LOG_INFO(logger, "{} is the last replica, will remove table", remote_replica_path);
+
+    /** At this moment, another replica can be created and we cannot remove the table.
+      * Try to remove /replicas node first. If we successfully removed it,
+      * it guarantees that we are the only replica that proceed to remove the table
+      * and no new replicas can be created after that moment (it requires the existence of /replicas node).
+      * and table cannot be recreated with new /replicas node on another servers while we are removing data,
+      * because table creation is executed in single transaction that will conflict with remaining nodes.
+      */
+
+    Coordination::Requests ops;
+    Coordination::Responses responses;
+    ops.emplace_back(zkutil::makeRemoveRequest(zookeeper_path + "/replicas", -1));
+    ops.emplace_back(zkutil::makeCreateRequest(zookeeper_path + "/dropped", "", zkutil::CreateMode::Persistent));
+    Coordination::Error code = zookeeper->tryMulti(ops, responses);
+
+    if (code == Coordination::Error::ZNONODE || code == Coordination::Error::ZNODEEXISTS)
+    {
+        LOG_WARNING(logger, "Table {} is already started to be removing by another replica right now", remote_replica_path);
+    }
+    else if (code == Coordination::Error::ZNOTEMPTY)
+    {
+        LOG_WARNING(logger, "Another replica was suddenly created, will keep the table {}", remote_replica_path);
+    }
+    else if (code != Coordination::Error::ZOK)
+    {
+        zkutil::KeeperMultiException::check(code, ops, responses);
+    }
+    else
+    {
+        LOG_INFO(logger, "Removing table {} (this might take several minutes)", zookeeper_path);
+
+        Strings children;
+        code = zookeeper->tryGetChildren(zookeeper_path, children);
+        if (code == Coordination::Error::ZNONODE)
+        {
+            LOG_WARNING(logger, "Table {} is already finished removing by another replica right now", remote_replica_path);
+        }
+        else
+        {
+            for (const auto & child : children)
+                if (child != "dropped")
+                    zookeeper->tryRemoveRecursive(zookeeper_path + "/" + child);

-            if (code == Coordination::Error::ZNONODE || code == Coordination::Error::ZNODEEXISTS)
+            ops.clear();
+            responses.clear();
+            ops.emplace_back(zkutil::makeRemoveRequest(zookeeper_path + "/dropped", -1));
+            ops.emplace_back(zkutil::makeRemoveRequest(zookeeper_path, -1));
+            code = zookeeper->tryMulti(ops, responses);
+
+            if (code == Coordination::Error::ZNONODE)
            {
-                LOG_WARNING(log, "Table {} is already started to be removing by another replica right now", replica_path);
+                LOG_WARNING(logger, "Table {} is already finished removing by another replica right now", remote_replica_path);
            }
            else if (code == Coordination::Error::ZNOTEMPTY)
            {
-                LOG_WARNING(log, "Another replica was suddenly created, will keep the table {}", replica_path);
+                LOG_ERROR(logger, "Table was not completely removed from ZooKeeper, {} still exists and may contain some garbage.",
+                          zookeeper_path);
            }
            else if (code != Coordination::Error::ZOK)
            {
+                /// It is still possible that ZooKeeper session is expired or server is killed in the middle of the delete operation.
                zkutil::KeeperMultiException::check(code, ops, responses);
            }
            else
            {
-                LOG_INFO(log, "Removing table {} (this might take several minutes)", zookeeper_path);
-
-                Strings children;
-                code = zookeeper->tryGetChildren(zookeeper_path, children);
-                if (code == Coordination::Error::ZNONODE)
-                {
-                    LOG_WARNING(log, "Table {} is already finished removing by another replica right now", replica_path);
-                }
-                else
-                {
-                    for (const auto & child : children)
-                        if (child != "dropped")
-                            zookeeper->tryRemoveRecursive(zookeeper_path + "/" + child);
-
-                    ops.clear();
-                    responses.clear();
-                    ops.emplace_back(zkutil::makeRemoveRequest(zookeeper_path + "/dropped", -1));
-                    ops.emplace_back(zkutil::makeRemoveRequest(zookeeper_path, -1));
-                    code = zookeeper->tryMulti(ops, responses);
-
-                    if (code == Coordination::Error::ZNONODE)
-                    {
-                        LOG_WARNING(log, "Table {} is already finished removing by another replica right now", replica_path);
-                    }
-                    else if (code == Coordination::Error::ZNOTEMPTY)
-                    {
-                        LOG_ERROR(log, "Table was not completely removed from ZooKeeper, {} still exists and may contain some garbage.",
-                            zookeeper_path);
-                    }
-                    else if (code != Coordination::Error::ZOK)
-                    {
-                        /// It is still possible that ZooKeeper session is expired or server is killed in the middle of the delete operation.
-                        zkutil::KeeperMultiException::check(code, ops, responses);
-                    }
-                    else
-                    {
-                        LOG_INFO(log, "Table {} was successfully removed from ZooKeeper", zookeeper_path);
-                    }
-                }
+                LOG_INFO(logger, "Table {} was successfully removed from ZooKeeper", zookeeper_path);
            }
        }
    }
-
-    dropAllData();
 }



--- a/src/Storages/StorageReplicatedMergeTree.h
+++ b/src/Storages/StorageReplicatedMergeTree.h
@@ -185,6 +185,10 @@ public:

    int getMetadataVersion() const { return metadata_version; }

+    /** Remove a specific replica from zookeeper.
+     */
+    static void dropReplica(zkutil::ZooKeeperPtr zookeeper, const String & zookeeper_path, const String & replica, Poco::Logger * logger);
+
 private:

    /// Get a sequential consistent view of current parts.

--- a/tests/integration/test_drop_replica/__init__.py
+++ b/tests/integration/test_drop_replica/__init__.py
--- a/tests/integration/test_drop_replica/configs/remote_servers.xml
+++ b/tests/integration/test_drop_replica/configs/remote_servers.xml
+<yandex>
+    <remote_servers>
+        <test_cluster>
+            <shard>
+                <internal_replication>true</internal_replication>
+                <replica>
+                    <host>node_1_1</host>
+                    <port>9000</port>
+                </replica>
+                <replica>
+                    <host>node_1_2</host>
+                    <port>9000</port>
+                </replica>
+                 <replica>
+                    <host>node_1_3</host>
+                    <port>9000</port>
+                </replica>
+            </shard>
+        </test_cluster>
+    </remote_servers>
+</yandex>
--- a/tests/integration/test_drop_replica/test.py
+++ b/tests/integration/test_drop_replica/test.py
+import time
+import pytest
+
+from helpers.cluster import ClickHouseCluster
+from helpers.cluster import ClickHouseKiller
+from helpers.test_tools import assert_eq_with_retry
+from helpers.network import PartitionManager
+
+def fill_nodes(nodes, shard):
+    for node in nodes:
+        node.query(
+        '''
+            CREATE DATABASE test;
+
+            CREATE TABLE test.test_table(date Date, id UInt32)
+            ENGINE = ReplicatedMergeTree('/clickhouse/tables/test/{shard}/replicated/test_table', '{replica}') ORDER BY id PARTITION BY toYYYYMM(date) SETTINGS min_replicated_logs_to_keep=3, max_replicated_logs_to_keep=5, cleanup_delay_period=0, cleanup_delay_period_random_add=0;
+        '''.format(shard=shard, replica=node.name))
+
+        node.query(
+        '''
+            CREATE DATABASE test1;
+
+            CREATE TABLE test1.test_table(date Date, id UInt32)
+            ENGINE = ReplicatedMergeTree('/clickhouse/tables/test1/{shard}/replicated/test_table', '{replica}') ORDER BY id PARTITION BY toYYYYMM(date) SETTINGS min_replicated_logs_to_keep=3, max_replicated_logs_to_keep=5, cleanup_delay_period=0, cleanup_delay_period_random_add=0;
+        '''.format(shard=shard, replica=node.name))
+
+        node.query(
+        '''
+            CREATE DATABASE test2;
+
+            CREATE TABLE test2.test_table(date Date, id UInt32)
+            ENGINE = ReplicatedMergeTree('/clickhouse/tables/test2/{shard}/replicated/test_table', '{replica}') ORDER BY id PARTITION BY toYYYYMM(date) SETTINGS min_replicated_logs_to_keep=3, max_replicated_logs_to_keep=5, cleanup_delay_period=0, cleanup_delay_period_random_add=0;
+        '''.format(shard=shard, replica=node.name))
+
+
+        node.query(
+        '''
+            CREATE DATABASE test3;
+
+            CREATE TABLE test3.test_table(date Date, id UInt32)
+            ENGINE = ReplicatedMergeTree('/clickhouse/tables/test3/{shard}/replicated/test_table', '{replica}') ORDER BY id PARTITION BY toYYYYMM(date) SETTINGS min_replicated_logs_to_keep=3, max_replicated_logs_to_keep=5, cleanup_delay_period=0, cleanup_delay_period_random_add=0;
+        '''.format(shard=shard, replica=node.name))
+
+        node.query(
+        '''
+            CREATE DATABASE test4;
+
+            CREATE TABLE test4.test_table(date Date, id UInt32)
+            ENGINE = ReplicatedMergeTree('/clickhouse/tables/test4/{shard}/replicated/test_table', '{replica}') ORDER BY id PARTITION BY toYYYYMM(date) SETTINGS min_replicated_logs_to_keep=3, max_replicated_logs_to_keep=5, cleanup_delay_period=0, cleanup_delay_period_random_add=0;
+        '''.format(shard=shard, replica=node.name))
+
+cluster = ClickHouseCluster(__file__)
+
+node_1_1 = cluster.add_instance('node_1_1', with_zookeeper=True, main_configs=['configs/remote_servers.xml'])
+node_1_2 = cluster.add_instance('node_1_2', with_zookeeper=True, main_configs=['configs/remote_servers.xml'])
+node_1_3 = cluster.add_instance('node_1_3', with_zookeeper=True, main_configs=['configs/remote_servers.xml'])
+
+
+@pytest.fixture(scope="module")
+def start_cluster():
+    try:
+        cluster.start()
+
+        fill_nodes([node_1_1, node_1_2], 1)
+
+        yield cluster
+
+    except Exception as ex:
+        print ex
+
+    finally:
+        cluster.shutdown()
+
+def test_drop_replica(start_cluster):
+    for i in range(100):
+        node_1_1.query("INSERT INTO test.test_table VALUES (1, {})".format(i))
+        node_1_1.query("INSERT INTO test1.test_table VALUES (1, {})".format(i))
+        node_1_1.query("INSERT INTO test2.test_table VALUES (1, {})".format(i))
+        node_1_1.query("INSERT INTO test3.test_table VALUES (1, {})".format(i))
+        node_1_1.query("INSERT INTO test4.test_table VALUES (1, {})".format(i))
+
+    zk = cluster.get_kazoo_client('zoo1')
+    assert "can't drop local replica" in node_1_1.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1'")
+    assert "can't drop local replica" in node_1_1.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM DATABASE test")
+    assert "can't drop local replica" in node_1_1.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM TABLE test.test_table")
+    assert "it's active" in node_1_2.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1'")
+    assert "it's active" in node_1_2.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM DATABASE test")
+    assert "it's active" in node_1_2.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM TABLE test.test_table")
+    assert "it's active" in \
+        node_1_3.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test/{shard}/replicated/test_table'".format(shard=1))
+    assert "There is a local table" in \
+        node_1_2.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test/{shard}/replicated/test_table'".format(shard=1))
+    assert "There is a local table" in \
+        node_1_1.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test/{shard}/replicated/test_table'".format(shard=1))
+    assert "does not look like a table path" in \
+           node_1_3.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test'")
+
+    with PartitionManager() as pm:
+        ## make node_1_1 dead
+        pm.drop_instance_zk_connections(node_1_1)
+        time.sleep(10)
+
+        assert "doesn't exist" in node_1_3.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM TABLE test.test_table")
+
+        assert "doesn't exist" in node_1_3.query_and_get_error("SYSTEM DROP REPLICA 'node_1_1' FROM DATABASE test1")
+
+        node_1_3.query("SYSTEM DROP REPLICA 'node_1_1'")
+        exists_replica_1_1 = zk.exists("/clickhouse/tables/test3/{shard}/replicated/test_table/replicas/{replica}".format(shard=1, replica='node_1_1'))
+        assert (exists_replica_1_1 != None)
+
+        ## If you want to drop a inactive/stale replicate table that does not have a local replica, you can following syntax(ZKPATH):
+        node_1_3.query("SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test2/{shard}/replicated/test_table'".format(shard=1))
+        exists_replica_1_1 = zk.exists("/clickhouse/tables/test2/{shard}/replicated/test_table/replicas/{replica}".format(shard=1, replica='node_1_1'))
+        assert (exists_replica_1_1 == None)
+
+        node_1_2.query("SYSTEM DROP REPLICA 'node_1_1' FROM TABLE test.test_table")
+        exists_replica_1_1 = zk.exists("/clickhouse/tables/test/{shard}/replicated/test_table/replicas/{replica}".format(shard=1, replica='node_1_1'))
+        assert (exists_replica_1_1 == None)
+
+        node_1_2.query("SYSTEM DROP REPLICA 'node_1_1' FROM DATABASE test1")
+        exists_replica_1_1 = zk.exists("/clickhouse/tables/test1/{shard}/replicated/test_table/replicas/{replica}".format(shard=1, replica='node_1_1'))
+        assert (exists_replica_1_1 == None)
+
+        node_1_3.query("SYSTEM DROP REPLICA 'node_1_1' FROM ZKPATH '/clickhouse/tables/test3/{shard}/replicated/test_table'".format(shard=1))
+        exists_replica_1_1 = zk.exists("/clickhouse/tables/test3/{shard}/replicated/test_table/replicas/{replica}".format(shard=1, replica='node_1_1'))
+        assert (exists_replica_1_1 == None)
+
+        node_1_2.query("SYSTEM DROP REPLICA 'node_1_1'")
+        exists_replica_1_1 = zk.exists("/clickhouse/tables/test4/{shard}/replicated/test_table/replicas/{replica}".format(shard=1, replica='node_1_1'))
+        assert (exists_replica_1_1 == None)
--- a/tests/queries/0_stateless/00752_low_cardinality_mv_2.reference
+++ b/tests/queries/0_stateless/00752_low_cardinality_mv_2.reference
+1
 c	2018-10-10 15:45:00	3	10	2018-10-10 15:54:21	1	1
--- a/tests/queries/0_stateless/00752_low_cardinality_mv_2.sql
+++ b/tests/queries/0_stateless/00752_low_cardinality_mv_2.sql
@@ -5,6 +5,7 @@ CREATE TABLE radacct ( radacctid UInt64, f3gppchargingid Nullable(String), f3gpp

 insert into radacct values (1, 'a', 'b', 'c', 'd', 'e', 2, 'a', 'b', 'c', 'd', 'e', 'f', 3, 4, 5, 6, 7, 'a', 'Stop', 'c', 'd', 'e', 'f', 'g', 'h', '2018-10-10 15:54:21', '2018-10-10 15:54:21', 8, 'a', 9, 10, 'a', 'b', '2018-10-10 15:54:21', 'a', 'b', 11, 12, '2018-10-10', 'a', 'b', 'c', 'd', 'e');

+SELECT any(acctstatustype = 'Stop') FROM radacct WHERE (acctstatustype = 'Stop') AND ((acctinputoctets + acctoutputoctets) > 0);
 create materialized view mv_traffic_by_tadig15min Engine=AggregatingMergeTree partition by tadig order by (ts,tadig) populate as select toStartOfFifteenMinutes(timestamp) ts,toDayOfWeek(timestamp) dow, tadig, sumState(acctinputoctets+acctoutputoctets) traffic_bytes,maxState(timestamp) last_stop, minState(radacctid) min_radacctid,maxState(radacctid) max_radacctid from radacct where acctstatustype='Stop' and acctinputoctets+acctoutputoctets > 0 group by tadig,ts,dow;

 select tadig, ts, dow, sumMerge(traffic_bytes), maxMerge(last_stop), minMerge(min_radacctid), maxMerge(max_radacctid) from mv_traffic_by_tadig15min group by tadig, ts, dow;

--- a/tests/queries/0_stateless/01071_force_optimize_skip_unused_shards.sql
+++ b/tests/queries/0_stateless/01071_force_optimize_skip_unused_shards.sql
@@ -24,6 +24,12 @@ set force_optimize_skip_unused_shards=1;
 select * from dist_01071; -- { serverError 507 }
 set force_optimize_skip_unused_shards=2;
 select * from dist_01071; -- { serverError 507 }
+drop table if exists dist_01071;
+
+-- non deterministic function (i.e. rand())
+create table dist_01071 as data_01071 Engine=Distributed(test_cluster_two_shards, currentDatabase(), data_01071, key + rand());
+set force_optimize_skip_unused_shards=1;
+select * from dist_01071 where key = 0; -- { serverError 507 }

 drop table if exists data_01071;
 drop table if exists dist_01071;
@@ -35,7 +41,7 @@ create table data2_01071 (key Int, sub_key Int) Engine=Null();
 create table dist2_layer_01071 as data2_01071 Engine=Distributed(test_cluster_two_shards, currentDatabase(), data2_01071, sub_key%2);
 create table dist2_01071 as data2_01071 Engine=Distributed(test_cluster_two_shards, currentDatabase(), dist2_layer_01071, key%2);
 select * from dist2_01071 where key = 1; -- { serverError 507 }
-set force_optimize_skip_unused_shards_no_nested=1;
+set force_optimize_skip_unused_shards_nesting=1;
 select * from dist2_01071 where key = 1;
 drop table if exists data2_01071;
 drop table if exists dist2_layer_01071;

--- a/tests/queries/0_stateless/01259_datetime64_ubsan.sql
+++ b/tests/queries/0_stateless/01259_datetime64_ubsan.sql
-select now64(10); -- { serverError 407 }
+select now64(10); -- { serverError 69 }
 select length(toString(now64(9)));
--- a/tests/queries/0_stateless/01268_DateTime64_in_WHERE.sql
+++ b/tests/queries/0_stateless/01268_DateTime64_in_WHERE.sql
@@ -6,7 +6,7 @@ WITH '2020-02-05 14:34:12.333' as S, toDateTime64(S, 3) as DT64 SELECT * WHERE D
 WITH '2020-02-05 14:34:12.333' as S, toDateTime64(S, 3) as DT64 SELECT * WHERE materialize(S) = DT64; -- {serverError 43}

 SELECT * WHERE toDateTime64(123.345, 3) == 'ABCD'; -- {serverError 53} -- invalid DateTime64 string
-SELECT * WHERE toDateTime64(123.345, 3) == '2020-02-05 14:34:12.33333333333333333333333333333333333333333333333333333333'; -- {serverError 53} -- invalid string length
+SELECT * WHERE toDateTime64(123.345, 3) == '2020-02-05 14:34:12.33333333333333333333333333333333333333333333333333333333';

 SELECT 'in SELECT';
 WITH '2020-02-05 14:34:12.333' as S, toDateTime64(S, 3) as DT64 SELECT DT64 = S;

--- a/tests/queries/0_stateless/01271_show_privileges.reference
+++ b/tests/queries/0_stateless/01271_show_privileges.reference
@@ -89,6 +89,7 @@ SYSTEM DISTRIBUTED SENDS	['SYSTEM STOP DISTRIBUTED SENDS','SYSTEM START DISTRIBU
 SYSTEM REPLICATED SENDS	['SYSTEM STOP REPLICATED SENDS','SYSTEM START REPLICATED SENDS','STOP_REPLICATED_SENDS','START REPLICATED SENDS']	TABLE	SYSTEM SENDS
 SYSTEM SENDS	['SYSTEM STOP SENDS','SYSTEM START SENDS','STOP SENDS','START SENDS']	\N	SYSTEM
 SYSTEM REPLICATION QUEUES	['SYSTEM STOP REPLICATION QUEUES','SYSTEM START REPLICATION QUEUES','STOP_REPLICATION_QUEUES','START REPLICATION QUEUES']	TABLE	SYSTEM
+SYSTEM DROP REPLICA	['DROP REPLICA']	TABLE	SYSTEM
 SYSTEM SYNC REPLICA	['SYNC REPLICA']	TABLE	SYSTEM
 SYSTEM RESTART REPLICA	['RESTART REPLICA']	TABLE	SYSTEM
 SYSTEM FLUSH DISTRIBUTED	['FLUSH DISTRIBUTED']	TABLE	SYSTEM FLUSH

--- a/tests/queries/0_stateless/01319_mv_constants_bug.sql
+++ b/tests/queries/0_stateless/01319_mv_constants_bug.sql
@@ -3,6 +3,7 @@ DROP TABLE IF EXISTS distributed_table_1;
 DROP TABLE IF EXISTS distributed_table_2;
 DROP TABLE IF EXISTS local_table_1;
 DROP TABLE IF EXISTS local_table_2;
+DROP TABLE IF EXISTS local_table_merged;

 CREATE TABLE local_table_1 (id String) ENGINE = MergeTree ORDER BY (id);
 CREATE TABLE local_table_2(id String) ENGINE = MergeTree ORDER BY (id);

--- a/tests/queries/0_stateless/01319_optimize_skip_unused_shards_nesting.reference
+++ b/tests/queries/0_stateless/01319_optimize_skip_unused_shards_nesting.reference
--- a/tests/queries/0_stateless/01319_optimize_skip_unused_shards_nesting.sql
+++ b/tests/queries/0_stateless/01319_optimize_skip_unused_shards_nesting.sql
+drop table if exists data_01319;
+drop table if exists dist_01319;
+drop table if exists dist_layer_01319;
+
+create table data_01319 (key Int, sub_key Int) Engine=Null();
+
+create table dist_layer_01319 as data_01319 Engine=Distributed(test_cluster_two_shards, currentDatabase(), data_01319, sub_key);
+-- test_unavailable_shard here to check that optimize_skip_unused_shards always
+-- remove some nodes from the cluster for the first nesting level
+create table dist_01319 as data_01319 Engine=Distributed(test_unavailable_shard, currentDatabase(), dist_layer_01319, key+1);
+
+set optimize_skip_unused_shards=1;
+set force_optimize_skip_unused_shards=1;
+
+set force_optimize_skip_unused_shards_nesting=2;
+set optimize_skip_unused_shards_nesting=2;
+select * from dist_01319 where key = 1; -- { serverError 507 }
+set force_optimize_skip_unused_shards_nesting=1;
+select * from dist_01319 where key = 1;
+set force_optimize_skip_unused_shards_nesting=2;
+set optimize_skip_unused_shards_nesting=1;
+select * from dist_01319 where key = 1;
--- a/tests/queries/0_stateless/01320_optimize_skip_unused_shards_no_non_deterministic.reference
+++ b/tests/queries/0_stateless/01320_optimize_skip_unused_shards_no_non_deterministic.reference
--- a/tests/queries/0_stateless/01320_optimize_skip_unused_shards_no_non_deterministic.sql
+++ b/tests/queries/0_stateless/01320_optimize_skip_unused_shards_no_non_deterministic.sql
+drop table if exists data_01320;
+drop table if exists dist_01320;
+
+create table data_01320 (key Int) Engine=Null();
+-- non deterministic function (i.e. rand())
+create table dist_01320 as data_01320 Engine=Distributed(test_cluster_two_shards, currentDatabase(), data_01320, key + rand());
+
+set optimize_skip_unused_shards=1;
+set force_optimize_skip_unused_shards=1;
+select * from dist_01320 where key = 0; -- { serverError 507 }
--- a/tests/queries/0_stateless/01340_datetime64_fpe.reference
+++ b/tests/queries/0_stateless/01340_datetime64_fpe.reference
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
+2011-11-11 11:11:11
--- a/tests/queries/0_stateless/01340_datetime64_fpe.sql
+++ b/tests/queries/0_stateless/01340_datetime64_fpe.sql
+WITH toDateTime64('2019-09-16 19:20:12.3456789102019-09-16 19:20:12.345678910', 0) AS dt64 SELECT dt64; -- { serverError 6 }
+
+SELECT toDateTime64('2011-11-11 11:11:11.1234567890123456789', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.-12345678901234567890', 0); -- { serverError 6 }
+
+
+SELECT toDateTime64('2011-11-11 11:11:11.1', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.11', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.1111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.11111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.1111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.11111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.1111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.11111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.1111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.11111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.111111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.1111111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.11111111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.111111111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.1111111111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.11111111111111111111', 0);
+SELECT toDateTime64('2011-11-11 11:11:11.111111111111111111111', 0);
+
+SELECT toDateTime64('2011-11-11 11:11:11.-1', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-11', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-1111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-11111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-1111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-11111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-1111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-11111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-1111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-11111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-111111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-1111111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-11111111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-111111111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-1111111111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-11111111111111111111', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.-111111111111111111111', 0); -- { serverError 6 }
+
+SELECT toDateTime64('2011-11-11 11:11:11.+1', 0); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.++11', 10); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+111', 3); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.+++1111', 5); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.+11111', 7); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.+++++111111', 2); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.+1111111', 1); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.++++++11111111', 8); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.+111111111', 9); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.+++++++1111111111', 6); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.+11111111111', 4); -- { serverError 6 }
+SELECT toDateTime64('2011-11-11 11:11:11.++++++++111111111111', 11);  -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+1111111111111', 15); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+++++++++11111111111111', 13); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+111111111111111', 12); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.++++++++++1111111111111111', 16); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+11111111111111111', 14); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+++++++++++111111111111111111', 15); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+1111111111111111111', 17); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.++++++++++++11111111111111111111', 19); -- { serverError 69 }
+SELECT toDateTime64('2011-11-11 11:11:11.+111111111111111111111', 18); -- { serverError 69 }
--- a/tests/queries/0_stateless/01341_datetime64_wrong_supertype.reference
+++ b/tests/queries/0_stateless/01341_datetime64_wrong_supertype.reference
+['2000-01-01 01:01:01.123000','2000-01-01 01:01:01.123456']
--- a/tests/queries/0_stateless/01341_datetime64_wrong_supertype.sql
+++ b/tests/queries/0_stateless/01341_datetime64_wrong_supertype.sql
+SELECT [toDateTime64('2000-01-01 01:01:01.123', 3), toDateTime64('2000-01-01 01:01:01.123456', 6)];
--- a/tests/queries/bugs/leak_when_memory_limit_exceeded.sql
+++ b/tests/queries/bugs/leak_when_memory_limit_exceeded.sql
--  max_memory_usage = 10000000000 (10 GB default)
--  Intel® Xeon® E5-1650 v3 Hexadcore 128 GB DDR4 ECC
--  Estimated time: ~ 250 seconds
--  Read rows:      ~ 272 000 000
-SELECT
-  key,
-  uniqState(uuid_1) uuid_1_st,
-  uniqState(uuid_2) uuid_2_st,
-  uniqState(uuid_3) uuid_3_st
-FROM (
-  SELECT
-    rand64() value,
-    toString(value) value_str,
-    UUIDNumToString(toFixedString(substring(value_str, 1, 16), 16)) uuid_1, -- Any UUID
-    UUIDNumToString(toFixedString(substring(value_str, 2, 16), 16)) uuid_2, -- More memory
-    UUIDNumToString(toFixedString(substring(value_str, 3, 16), 16)) uuid_3, -- And more memory
-    modulo(value, 5000000) key -- Cardinality in my case
-  FROM numbers(550000000)
-)
-GROUP BY
-  key
-LIMIT 100;
--- a/tests/server-test.xml
+++ b/tests/server-test.xml
@@ -17,6 +17,7 @@
    <https_port>58443</https_port>
    <tcp_port_secure>59440</tcp_port_secure>
    <interserver_http_port>59009</interserver_http_port>
+    <max_thread_pool_size>10000</max_thread_pool_size>
    <openSSL>
        <server> <!-- Used for https server AND secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->

--- a/utils/github-hook/hook.py
+++ b/utils/github-hook/hook.py
@@ -21,7 +21,7 @@ def process_issue_event(response):
        state=issue['state'],
        assignees=[assignee['login'] for assignee in issue['assignees']],
        created_at=issue['created_at'],
-        body=issue['body'],
+        body=issue['body'] if issue['body'] else '',
        title=issue['title'],
        comments=issue['comments'],
        raw_json=json.dumps(response),)
@@ -42,7 +42,7 @@ def process_issue_comment_event(response):
        state=issue['state'],
        assignees=[assignee['login'] for assignee in issue['assignees']],
        created_at=issue['created_at'],
-        body=issue['body'],
+        body=issue['body'] if issue['body'] else '',
        title=issue['title'],
        comments=issue['comments'],
        comment_body=comment['body'],
@@ -64,7 +64,7 @@ def process_pull_request_event(response):
        author=pull_request['user']['login'],
        labels=[label['name'] for label in pull_request['labels']],
        state=pull_request['state'],
-        body=pull_request['body'],
+        body=pull_request['body'] if pull_request['body'] else '',
        title=pull_request['title'],
        created_at=pull_request['created_at'],
        assignees=[assignee['login'] for assignee in pull_request['assignees']],