提交 97ac9796 编写于 作者: S stavrolia

Add docs for hdfs and fix some review comments

上级 974789d3
......@@ -2,12 +2,13 @@
#include <re2/re2.h>
#include <re2/stringpiece.h>
#include <algorithm>
#include <sstream>
namespace DB
{
/* Transforms string from grep-wildcard-syntax ("{N..M}", "{a,b,c}" as in remote table function and "*", "?") to perl-regexp for using re2 library fo matching
* with such steps:
* 1) search intervals and enums in {}, replace them by regexp with pipe (expr1|expr2|expr3),
* 1) search intervals like {0..9} and enums like {abc,xyz,qwe} in {}, replace them by regexp with pipe (expr1|expr2|expr3),
* 2) search and replace "*" and "?".
* Before each search need to escape symbols that we would not search.
*
......@@ -15,63 +16,60 @@ namespace DB
*/
std::string makeRegexpPatternFromGlobs(const std::string & initial_str_with_globs)
{
std::string escaped_with_globs;
escaped_with_globs.reserve(initial_str_with_globs.size());
std::ostringstream oss;
/// Escaping only characters that not used in glob syntax
for (const auto & letter : initial_str_with_globs)
{
if ((letter == '[') || (letter == ']') || (letter == '|') || (letter == '+') || (letter == '-') || (letter == '(') || (letter == ')'))
escaped_with_globs.push_back('\\');
escaped_with_globs.push_back(letter);
oss << '\\';
oss << letter;
}
re2::RE2 enum_or_range(R"({([\d]+\.\.[\d]+|[^{}*,]+,[^{}*]*[^{}*,])})"); /// regexp for {expr1,expr2,expr3} or {M..N}, where M and N - non-negative integers, expr's should be without {}*,
std::string escaped_with_globs = oss.str();
oss.str("");
static const re2::RE2 enum_or_range(R"({([\d]+\.\.[\d]+|[^{}*,]+,[^{}*]*[^{}*,])})"); /// regexp for {expr1,expr2,expr3} or {M..N}, where M and N - non-negative integers, expr's should be without {}*,
re2::StringPiece input(escaped_with_globs);
re2::StringPiece matched(escaped_with_globs);
re2::StringPiece matched;
size_t current_index = 0;
std::string almost_regexp;
almost_regexp.reserve(escaped_with_globs.size());
while (RE2::FindAndConsume(&input, enum_or_range, &matched))
{
std::string buffer = matched.ToString();
almost_regexp.append(escaped_with_globs.substr(current_index, matched.data() - escaped_with_globs.data() - current_index - 1));
oss << escaped_with_globs.substr(current_index, matched.data() - escaped_with_globs.data() - current_index - 1) << '(';
if (buffer.find(',') == std::string::npos)
{
size_t first_point = buffer.find('.');
std::string first_number = buffer.substr(0, first_point);
std::string second_number = buffer.substr(first_point + 2, buffer.size() - first_point - 2);
size_t range_begin = std::stoull(first_number);
size_t range_end = std::stoull(second_number);
buffer = std::to_string(range_begin);
size_t range_begin, range_end;
char point;
std::istringstream iss(buffer);
iss >> range_begin >> point >> point >> range_end;
oss << range_begin;
for (size_t i = range_begin + 1; i <= range_end; ++i)
{
buffer += "|";
buffer += std::to_string(i);
oss << '|' << i;
}
}
else
{
std::replace(buffer.begin(), buffer.end(), ',', '|');
oss << buffer;
}
almost_regexp.append("(" + buffer + ")");
oss << ")";
current_index = input.data() - escaped_with_globs.data();
}
almost_regexp += escaped_with_globs.substr(current_index);
std::string result;
result.reserve(almost_regexp.size());
for (const auto & letter : almost_regexp)
oss << escaped_with_globs.substr(current_index);
std::string almost_res = oss.str();
oss.str("");
for (const auto & letter : almost_res)
{
if ((letter == '?') || (letter == '*'))
{
result += "[^/]"; /// '?' is any symbol except '/'
oss << "[^/]"; /// '?' is any symbol except '/'
if (letter == '?')
continue;
}
if ((letter == '.') || (letter == '{') || (letter == '}'))
result.push_back('\\');
result.push_back(letter);
oss << '\\';
oss << letter;
}
return result;
return oss.str();
}
}
......@@ -27,7 +27,7 @@ When creating table using `File(Format)` it creates empty subdirectory in that f
You may manually create this subfolder and file in server filesystem and then [ATTACH](../../query_language/misc.md) it to table information with matching name, so you can query data from that file.
!!! warning
Be careful with this funcionality, because ClickHouse does not keep track of external changes to such files. The result of simultaneous writes via ClickHouse and outside of ClickHouse is undefined.
Be careful with this functionality, because ClickHouse does not keep track of external changes to such files. The result of simultaneous writes via ClickHouse and outside of ClickHouse is undefined.
**Example:**
......@@ -73,9 +73,9 @@ $ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64
- Multiple `SELECT` queries can be performed concurrently, but `INSERT` queries will wait each other.
- Not supported:
- `ALTER`
- `SELECT ... SAMPLE`
- Indices
- Replication
- `ALTER`
- `SELECT ... SAMPLE`
- Indices
- Replication
[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/file/) <!--hide-->
# HDFS {#table_engines-hdfs}
Manages data on HDFS. This engine is similar
to the [File](file.md) and [URL](url.md) engine.
## Usage
```
ENGINE = HDFS(URI, format)
```
The `format` parameter specifies one of the available file formats. To perform
`SELECT` queries, the format must be supported for input, and to perform
`INSERT` queries -- for output. The available formats are listed in the
[Formats](../../interfaces/formats.md#formats) section.
**Example:**
**1.** Set up the `HDFS_engine_table` table:
``` sql
CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://hdfs1:9000/other_storage', 'TSV')
```
**2.** Query the data:
``` sql
SELECT * FROM hdfs_engine_table LIMIT 2
```
```
┌─name─┬─value─┐
│ one │ 1 │
│ two │ 2 │
└──────┴───────┘
```
## Details of Implementation
- Reads and writes can be parallel
- Not supported:
- `ALTER` and `SELECT...SAMPLE` operations.
- Indexes.
- Replication.
[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/hdfs/) <!--hide-->
......@@ -11,7 +11,7 @@ file(path, format, structure)
- `path` — The relative path to the file from [user_files_path](../../operations/server_settings/settings.md#server_settings-user_files_path).
- `format` — The [format](../../interfaces/formats.md#formats) of the file.
- `structure` — Structure of the table. Format `'colunmn1_name column1_ype, column2_name column2_type, ...'`.
- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
**Returned value**
......
# hdfs
Creates a table from a file in hdfs.
```
hdfs(URI, format, structure)
```
**Input parameters**
- `URI` — The relative URI to the file in HDFS.
- `format` — The [format](../../interfaces/formats.md#formats) of the file.
- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
**Returned value**
A table with the specified structure for reading or writing data in the specified file.
**Example**
Table from `hdfs://hdfs1:9000/test` and selection of the first two rows from it:
```sql
SELECT *
FROM hdfs('hdfs://hdfs1:9000/test', 'TSV', 'column1 UInt32, column2 UInt32, column3 UInt32')
LIMIT 2
```
```
┌─column1─┬─column2─┬─column3─┐
│ 1 │ 2 │ 3 │
│ 3 │ 2 │ 1 │
└─────────┴─────────┴─────────┘
```
[Original article](https://clickhouse.yandex/docs/en/query_language/table_functions/hdfs/) <!--hide-->
# HDFS {#table_engines-hdfs}
Управляет данными в HDFS. Данный движок похож на движок [File](file.md) и на движок [URL](url.md).
## Использование движка
```
ENGINE = HDFS(URI, format)
```
Параметр `format` должен быть таким, который ClickHouse может использовать и в запросах `INSERT`, и в запросах `SELECT`. Полный список поддерживаемых форматов смотрите в разделе [Форматы](../../interfaces/formats.md#formats).
**Пример:**
**1.** Создадим на сервере таблицу `hdfs_engine_table`:
``` sql
CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://hdfs1:9000/other_storage', 'TSV')
```
**2.** Запросим данные:
``` sql
SELECT * FROM hdfs_engine_table LIMIT 2
```
```
┌─name─┬─value─┐
│ one │ 1 │
│ two │ 2 │
└──────┴───────┘
```
## Детали реализации
- Поддерживается многопоточное чтение и запись.
- Не поддерживается:
- использование операций `ALTER` и `SELECT...SAMPLE`;
- индексы;
- репликация.
[Оригинальная статья](https://clickhouse.yandex/docs/ru/operations/table_engines/hdfs/) <!--hide-->
# hdfs
Создаёт таблицу из файла.
```
hdfs(URI, format, structure)
```
**Входные параметры**
- `URI` — URI до файла в HDFS.
- `format`[формат](../../interfaces/formats.md#formats) файла.
- `structure` — структура таблицы. Формат `'column1_name column1_type, column2_name column2_type, ...'`.
**Возвращаемое значение**
Таблица с указанной структурой, предназначенная для чтения или записи данных в указанном файле.
**Пример**
Таблица из `hdfs://hdfs1:9000/test` и выборка первых двух строк из неё:
``` sql
SELECT *
FROM hdfs('hdfs://hdfs1:9000/test', 'TSV', 'column1 UInt32, column2 UInt32, column3 UInt32')
LIMIT 2
```
```
┌─column1─┬─column2─┬─column3─┐
│ 1 │ 2 │ 3 │
│ 3 │ 2 │ 1 │
└─────────┴─────────┴─────────┘
```
[Оригинальная статья](https://clickhouse.yandex/docs/ru/query_language/table_functions/hdfs/) <!--hide-->
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册