提交 bec23fbb 编写于 作者: A alexey-milovidov 提交者: GitHub

Merge pull request #179 from a-square/patch-1

Grammar in architecture.md
# ClickHouse quick architecture overview
> Gray text is for side notes you don't have to read.
> Optional side notes are in grey.
ClickHouse is a true column oriented DBMS. Data is stored by columns. Even more, during query execution, data is processed by arrays (vectors, chunks of columns). In all places, where it is possible, operations on data are dispatched not for individual values but for arrays. It is called "vectorized query execution". This allows to lower dispatch cost relatively to cost of actual data processing.
ClickHouse is a true column oriented DBMS. Data is stored by columns, and furthermore, during query execution data is processed by arrays (vectors, chunks of columns). Whenever possible, operations are dispatched not on individual values but on arrays. It is called "vectorized query execution", and it helps lower dispatch cost relative to the cost of actual data processing.
>This idea is not any new. It is dated back to `APL` programming language and its descendants: `A+`, `J`, `K`, `Q`. Array programming is widely used in scientific data processing. Also, this idea is not new for relational databases: for example, it is used in `Vectorwise` system.
>This idea is nothing new. It dates back to the `APL` programming language and its descendants: `A+`, `J`, `K`, `Q`. Array programming is widely used in scientific data processing. Neither is this idea something new in relational databases: for example, it is used in the `Vectorwise` system.
>To speed up query processing, there are two different approaches: vectorized query execution and runtime code generation. In second approach, the code is generated for every kind of query on the fly, removing all indirection and dynamic dispatch. No one of these approaches is strictly better than the other. Runtime code generation could be better if it will fuse many operations together and could fully utilize CPU execution units and pipeline. Vectorized query execution could be worse because it must deal with temporary vectors, that must be written to cache and read back. If temporary data does not fit in L2 cache, it becomes an issue. But vectorized query execution more easily utilize SIMD capabilities of CPU. There is [research paper](http://15721.courses.cs.cmu.edu/spring2016/papers/p5-sompolski.pdf) from our friends that shows, that better to combine both approaches. ClickHouse mostly use vectorized query execution and has limited initial support for runtime code generation (only inner loop for first stage of GROUP BY could be compiled).
>There are two different approaches for speeding up query processing: vectorized query execution and runtime code generation. In the latter, the code is generated for every kind of query on the fly, removing all indirection and dynamic dispatch. None of these approaches is strictly better than the other. Runtime code generation can be better when fuses many operations together, thus fully utilizing CPU execution units and pipeline. Vectorized query execution can be worse, because it must deal with temporary vectors that must be written to cache and read back. If the temporary data does not fit in L2 cache, this becomes an issue. But vectorized query execution more easily utilizes SIMD capabilities of CPU. A [research paper](http://15721.courses.cs.cmu.edu/spring2016/papers/p5-sompolski.pdf) written by our friends shows that it is better to combine both approaches. ClickHouse mostly uses vectorized query execution and has limited initial support for runtime code generation (only the inner loop of first stage of GROUP BY can be compiled).
## Columns
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册