提交 0466f6be 编写于 作者: A alesapin

Merge branch 'master' of github.com:yandex/ClickHouse

Copyright 2016-2018 YANDEX LLC
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2016-2018 Yandex LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
......
# ClickHouse
[![ClickHouse — open source distributed column-oriented DBMS](https://github.com/yandex/ClickHouse/raw/master/website/images/logo-400x240.png)](https://clickhouse.yandex)
ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real time.
......
if (NOT ARCH_ARM)
if (NOT ARCH_ARM AND NOT OS_FREEBSD AND NOT APPLE)
option (ENABLE_HDFS "Enable HDFS" ${NOT_UNBUNDLED})
endif ()
......
option (USE_INTERNAL_LIBGSASL_LIBRARY "Set to FALSE to use system libgsasl library instead of bundled" ${NOT_UNBUNDLED})
if (NOT APPLE)
option (USE_INTERNAL_LIBGSASL_LIBRARY "Set to FALSE to use system libgsasl library instead of bundled" ${NOT_UNBUNDLED})
endif ()
if (USE_INTERNAL_LIBGSASL_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libgsasl/src/gsasl.h")
message (WARNING "submodule contrib/libgsasl is missing. to fix try run: \n git submodule update --init --recursive")
......
......@@ -202,7 +202,7 @@ if (USE_INTERNAL_LIBXML2_LIBRARY)
endif ()
if (USE_INTERNAL_HDFS3_LIBRARY)
include(${CMAKE_SOURCE_DIR}/cmake/find_protobuf.cmake)
include(${ClickHouse_SOURCE_DIR}/cmake/find_protobuf.cmake)
if (USE_INTERNAL_PROTOBUF_LIBRARY)
set(protobuf_BUILD_TESTS OFF CACHE INTERNAL "" FORCE)
set(protobuf_BUILD_SHARED_LIBS OFF CACHE INTERNAL "" FORCE)
......
......@@ -58,6 +58,8 @@ set(SRCS
)
add_library(libxml2 STATIC ${SRCS})
target_include_directories(libxml2 PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/linux_x86_64/include)
target_link_libraries(libxml2 ${ZLIB_LIBRARIES})
target_include_directories(libxml2 PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/linux_x86_64/include)
target_include_directories(libxml2 PUBLIC ${LIBXML2_SOURCE_DIR}/include)
target_include_directories(libxml2 PRIVATE ${ZLIB_INCLUDE_DIR}/include)
Subproject commit 20c1d877773b6a672f1bbfe3290dfea42a117ed5
Subproject commit fe5505e56c27b6ecb0dcbc40c49dc2caf4e9637f
......@@ -36,6 +36,7 @@ public:
const ColumnPtr & getNestedColumn() const override;
const ColumnPtr & getNestedNotNullableColumn() const override { return column_holder; }
bool nestedColumnIsNullable() const override { return is_nullable; }
size_t uniqueInsert(const Field & x) override;
size_t uniqueInsertFrom(const IColumn & src, size_t n) override;
......
......@@ -18,6 +18,8 @@ public:
/// The same as getNestedColumn, but removes null map if nested column is nullable.
virtual const ColumnPtr & getNestedNotNullableColumn() const = 0;
virtual bool nestedColumnIsNullable() const = 0;
/// Returns array with StringRefHash calculated for each row of getNestedNotNullableColumn() column.
/// Returns nullptr if nested column doesn't contain strings. Otherwise calculates hash (if it wasn't).
/// Uses thread-safe cache.
......
......@@ -10,16 +10,17 @@ template
typename Cell,
typename Hash = DefaultHash<Key>,
typename Grower = TwoLevelHashTableGrower<>,
typename Allocator = HashTableAllocator
typename Allocator = HashTableAllocator,
template <typename ...> typename ImplTable = HashMapTable
>
class TwoLevelHashMapTable : public TwoLevelHashTable<Key, Cell, Hash, Grower, Allocator, HashMapTable<Key, Cell, Hash, Grower, Allocator>>
class TwoLevelHashMapTable : public TwoLevelHashTable<Key, Cell, Hash, Grower, Allocator, ImplTable<Key, Cell, Hash, Grower, Allocator>>
{
public:
using key_type = Key;
using mapped_type = typename Cell::Mapped;
using value_type = typename Cell::value_type;
using TwoLevelHashTable<Key, Cell, Hash, Grower, Allocator, HashMapTable<Key, Cell, Hash, Grower, Allocator>>::TwoLevelHashTable;
using TwoLevelHashTable<Key, Cell, Hash, Grower, Allocator, ImplTable<Key, Cell, Hash, Grower, Allocator>>::TwoLevelHashTable;
mapped_type & ALWAYS_INLINE operator[](Key x)
{
......@@ -41,9 +42,10 @@ template
typename Mapped,
typename Hash = DefaultHash<Key>,
typename Grower = TwoLevelHashTableGrower<>,
typename Allocator = HashTableAllocator
typename Allocator = HashTableAllocator,
template <typename ...> typename ImplTable = HashMapTable
>
using TwoLevelHashMap = TwoLevelHashMapTable<Key, HashMapCell<Key, Mapped, Hash>, Hash, Grower, Allocator>;
using TwoLevelHashMap = TwoLevelHashMapTable<Key, HashMapCell<Key, Mapped, Hash>, Hash, Grower, Allocator, ImplTable>;
template
......@@ -52,6 +54,7 @@ template
typename Mapped,
typename Hash = DefaultHash<Key>,
typename Grower = TwoLevelHashTableGrower<>,
typename Allocator = HashTableAllocator
typename Allocator = HashTableAllocator,
template <typename ...> typename ImplTable = HashMapTable
>
using TwoLevelHashMapWithSavedHash = TwoLevelHashMapTable<Key, HashMapCellWithSavedHash<Key, Mapped, Hash>, Hash, Grower, Allocator>;
using TwoLevelHashMapWithSavedHash = TwoLevelHashMapTable<Key, HashMapCellWithSavedHash<Key, Mapped, Hash>, Hash, Grower, Allocator, ImplTable>;
......@@ -154,7 +154,10 @@ Block NativeBlockInputStream::readImpl()
column.column = std::move(read_column);
if (server_revision && server_revision < DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE)
{
column.column = recursiveLowCardinalityConversion(column.column, column.type, header.getByPosition(i).type);
column.type = header.getByPosition(i).type;
}
res.insert(std::move(column));
......
......@@ -453,6 +453,27 @@ AggregatedDataVariants::Type Aggregator::chooseAggregationMethod()
return AggregatedDataVariants::Type::nullable_keys256;
}
if (has_low_cardinality && params.keys_size == 1)
{
if (types_removed_nullable[0]->isValueRepresentedByNumber())
{
size_t size_of_field = types_removed_nullable[0]->getSizeOfValueInMemory();
if (size_of_field == 1)
return AggregatedDataVariants::Type::low_cardinality_key8;
if (size_of_field == 2)
return AggregatedDataVariants::Type::low_cardinality_key16;
if (size_of_field == 4)
return AggregatedDataVariants::Type::low_cardinality_key32;
if (size_of_field == 8)
return AggregatedDataVariants::Type::low_cardinality_key64;
}
else if (isString(types_removed_nullable[0]))
return AggregatedDataVariants::Type::low_cardinality_key_string;
else if (isFixedString(types_removed_nullable[0]))
return AggregatedDataVariants::Type::low_cardinality_key_fixed_string;
}
/// Fallback case.
return AggregatedDataVariants::Type::serialized;
}
......@@ -1139,12 +1160,10 @@ void Aggregator::convertToBlockImpl(
convertToBlockImplFinal(method, data, key_columns, final_aggregate_columns);
else
convertToBlockImplNotFinal(method, data, key_columns, aggregate_columns);
/// In order to release memory early.
data.clearAndShrink();
}
template <typename Method, typename Table>
void NO_INLINE Aggregator::convertToBlockImplFinal(
Method & method,
......@@ -1152,6 +1171,19 @@ void NO_INLINE Aggregator::convertToBlockImplFinal(
MutableColumns & key_columns,
MutableColumns & final_aggregate_columns) const
{
if constexpr (Method::low_cardinality_optimization)
{
if (data.hasNullKeyData())
{
key_columns[0]->insert(Field()); /// Null
for (size_t i = 0; i < params.aggregates_size; ++i)
aggregate_functions[i]->insertResultInto(
data.getNullKeyData() + offsets_of_aggregate_states[i],
*final_aggregate_columns[i]);
}
}
for (const auto & value : data)
{
method.insertKeyIntoColumns(value, key_columns, key_sizes);
......@@ -1172,6 +1204,17 @@ void NO_INLINE Aggregator::convertToBlockImplNotFinal(
MutableColumns & key_columns,
AggregateColumnsData & aggregate_columns) const
{
if constexpr (Method::low_cardinality_optimization)
{
if (data.hasNullKeyData())
{
key_columns[0]->insert(Field()); /// Null
for (size_t i = 0; i < params.aggregates_size; ++i)
aggregate_columns[i]->push_back(data.getNullKeyData() + offsets_of_aggregate_states[i]);
}
}
for (auto & value : data)
{
method.insertKeyIntoColumns(value, key_columns, key_sizes);
......@@ -1470,12 +1513,50 @@ BlocksList Aggregator::convertToBlocks(AggregatedDataVariants & data_variants, b
}
template <typename Method, typename Table>
void NO_INLINE Aggregator::mergeDataNullKey(
Table & table_dst,
Table & table_src,
Arena * arena) const
{
if constexpr (Method::low_cardinality_optimization)
{
if (table_src.hasNullKeyData())
{
if (!table_dst.hasNullKeyData())
{
table_dst.hasNullKeyData() = true;
table_dst.getNullKeyData() = table_src.getNullKeyData();
}
else
{
for (size_t i = 0; i < params.aggregates_size; ++i)
aggregate_functions[i]->merge(
table_dst.getNullKeyData() + offsets_of_aggregate_states[i],
table_src.getNullKeyData() + offsets_of_aggregate_states[i],
arena);
for (size_t i = 0; i < params.aggregates_size; ++i)
aggregate_functions[i]->destroy(
table_src.getNullKeyData() + offsets_of_aggregate_states[i]);
}
table_src.hasNullKeyData() = false;
table_src.getNullKeyData() = nullptr;
}
}
}
template <typename Method, typename Table>
void NO_INLINE Aggregator::mergeDataImpl(
Table & table_dst,
Table & table_src,
Arena * arena) const
{
if constexpr (Method::low_cardinality_optimization)
mergeDataNullKey<Method, Table>(table_dst, table_src, arena);
for (auto it = table_src.begin(), end = table_src.end(); it != end; ++it)
{
typename Table::iterator res_it;
......@@ -1513,6 +1594,10 @@ void NO_INLINE Aggregator::mergeDataNoMoreKeysImpl(
Table & table_src,
Arena * arena) const
{
/// Note : will create data for NULL key if not exist
if constexpr (Method::low_cardinality_optimization)
mergeDataNullKey<Method, Table>(table_dst, table_src, arena);
for (auto it = table_src.begin(), end = table_src.end(); it != end; ++it)
{
typename Table::iterator res_it = table_dst.find(it->first, it.getHash());
......@@ -1543,6 +1628,10 @@ void NO_INLINE Aggregator::mergeDataOnlyExistingKeysImpl(
Table & table_src,
Arena * arena) const
{
/// Note : will create data for NULL key if not exist
if constexpr (Method::low_cardinality_optimization)
mergeDataNullKey<Method, Table>(table_dst, table_src, arena);
for (auto it = table_src.begin(); it != table_src.end(); ++it)
{
decltype(it) res_it = table_dst.find(it->first, it.getHash());
......@@ -2341,6 +2430,15 @@ void NO_INLINE Aggregator::convertBlockToTwoLevelImpl(
/// For every row.
for (size_t i = 0; i < rows; ++i)
{
if constexpr (Method::low_cardinality_optimization)
{
if (state.isNullAt(i))
{
selector[i] = 0;
continue;
}
}
/// Obtain a key. Calculate bucket number from it.
typename Method::Key key = state.getKey(key_columns, params.keys_size, i, key_sizes, keys, *pool);
......
......@@ -88,6 +88,56 @@ using AggregatedDataWithStringKeyHash64 = HashMapWithSavedHash<StringRef, Aggreg
using AggregatedDataWithKeys128Hash64 = HashMap<UInt128, AggregateDataPtr, UInt128Hash>;
using AggregatedDataWithKeys256Hash64 = HashMap<UInt256, AggregateDataPtr, UInt256Hash>;
template <typename Base>
struct AggregationDataWithNullKey : public Base
{
using Base::Base;
bool & hasNullKeyData() { return has_null_key_data; }
AggregateDataPtr & getNullKeyData() { return null_key_data; }
bool hasNullKeyData() const { return has_null_key_data; }
const AggregateDataPtr & getNullKeyData() const { return null_key_data; }
private:
bool has_null_key_data = false;
AggregateDataPtr null_key_data = nullptr;
};
template <typename Base>
struct AggregationDataWithNullKeyTwoLevel : public Base
{
using Base::Base;
using Base::impls;
template <typename Other>
explicit AggregationDataWithNullKeyTwoLevel(const Other & other) : Base(other)
{
impls[0].hasNullKeyData() = other.hasNullKeyData();
impls[0].getNullKeyData() = other.getNullKeyData();
}
bool & hasNullKeyData() { return impls[0].hasNullKeyData(); }
AggregateDataPtr & getNullKeyData() { return impls[0].getNullKeyData(); }
bool hasNullKeyData() const { return impls[0].hasNullKeyData(); }
const AggregateDataPtr & getNullKeyData() const { return impls[0].getNullKeyData(); }
};
template <typename ... Types>
using HashTableWithNullKey = AggregationDataWithNullKey<HashMapTable<Types ...>>;
using AggregatedDataWithNullableUInt8Key = AggregationDataWithNullKey<AggregatedDataWithUInt8Key>;
using AggregatedDataWithNullableUInt16Key = AggregationDataWithNullKey<AggregatedDataWithUInt16Key>;
using AggregatedDataWithNullableUInt64Key = AggregationDataWithNullKey<AggregatedDataWithUInt64Key>;
using AggregatedDataWithNullableStringKey = AggregationDataWithNullKey<AggregatedDataWithStringKey>;
using AggregatedDataWithNullableUInt64KeyTwoLevel = AggregationDataWithNullKeyTwoLevel<
TwoLevelHashMap<UInt64, AggregateDataPtr, HashCRC32<UInt64>,
TwoLevelHashTableGrower<>, HashTableAllocator, HashTableWithNullKey>>;
using AggregatedDataWithNullableStringKeyTwoLevel = AggregationDataWithNullKeyTwoLevel<
TwoLevelHashMapWithSavedHash<StringRef, AggregateDataPtr, DefaultHash<StringRef>,
TwoLevelHashTableGrower<>, HashTableAllocator, HashTableWithNullKey>>;
/// Cache which can be used by aggregations method's states. Object is shared in all threads.
struct AggregationStateCache
{
......@@ -403,8 +453,10 @@ struct AggregationMethodSingleLowCardinalityColumn : public SingleColumnMethod
ColumnPtr dictionary_holder;
/// Cache AggregateDataPtr for current column in order to decrease the number of hash table usages.
PaddedPODArray<AggregateDataPtr> aggregate_data;
PaddedPODArray<AggregateDataPtr> * aggregate_data_cache;
PaddedPODArray<AggregateDataPtr> aggregate_data_cache;
/// If initialized column is nullable.
bool is_nullable = false;
void init(ColumnRawPtrs &)
{
......@@ -429,7 +481,8 @@ struct AggregationMethodSingleLowCardinalityColumn : public SingleColumnMethod
+ demangle(typeid(cached_val).name()), ErrorCodes::LOGICAL_ERROR);
}
auto * dict = column->getDictionary().getNestedColumn().get();
auto * dict = column->getDictionary().getNestedNotNullableColumn().get();
is_nullable = column->getDictionary().nestedColumnIsNullable();
key = {dict};
bool is_shared_dict = column->isSharedDictionary();
......@@ -463,8 +516,7 @@ struct AggregationMethodSingleLowCardinalityColumn : public SingleColumnMethod
}
AggregateDataPtr default_data = nullptr;
aggregate_data.assign(key[0]->size(), default_data);
aggregate_data_cache = &aggregate_data;
aggregate_data_cache.assign(key[0]->size(), default_data);
size_of_index_type = column->getSizeOfIndexType();
positions = column->getIndexesPtr().get();
......@@ -507,10 +559,18 @@ struct AggregationMethodSingleLowCardinalityColumn : public SingleColumnMethod
Arena & pool)
{
size_t row = getIndexAt(i);
if ((*aggregate_data_cache)[row])
if (is_nullable && row == 0)
{
inserted = !data.hasNullKeyData();
data.hasNullKeyData() = true;
return &data.getNullKeyData();
}
if (aggregate_data_cache[row])
{
inserted = false;
return &(*aggregate_data_cache)[row];
return &aggregate_data_cache[row];
}
else
{
......@@ -527,23 +587,35 @@ struct AggregationMethodSingleLowCardinalityColumn : public SingleColumnMethod
if (inserted)
Base::onNewKey(*it, keys_size, keys, pool);
else
(*aggregate_data_cache)[row] = Base::getAggregateData(it->second);
aggregate_data_cache[row] = Base::getAggregateData(it->second);
return &Base::getAggregateData(it->second);
}
}
ALWAYS_INLINE bool isNullAt(size_t i)
{
if (!is_nullable)
return false;
return getIndexAt(i) == 0;
}
ALWAYS_INLINE void cacheAggregateData(size_t i, AggregateDataPtr data)
{
size_t row = getIndexAt(i);
(*aggregate_data_cache)[row] = data;
aggregate_data_cache[row] = data;
}
template <typename D>
ALWAYS_INLINE AggregateDataPtr * findFromRow(D & data, size_t i)
{
size_t row = getIndexAt(i);
if (!(*aggregate_data_cache)[row])
if (is_nullable && row == 0)
return data.hasNullKeyData() ? &data.getNullKeyData() : nullptr;
if (!aggregate_data_cache[row])
{
ColumnRawPtrs key_columns;
Sizes key_sizes;
......@@ -558,9 +630,9 @@ struct AggregationMethodSingleLowCardinalityColumn : public SingleColumnMethod
it = data.find(key);
if (it != data.end())
(*aggregate_data_cache)[row] = Base::getAggregateData(it->second);
aggregate_data_cache[row] = Base::getAggregateData(it->second);
}
return &(*aggregate_data_cache)[row];
return &aggregate_data_cache[row];
}
};
......@@ -971,17 +1043,17 @@ struct AggregatedDataVariants : private boost::noncopyable
std::unique_ptr<AggregationMethodKeysFixed<AggregatedDataWithKeys256TwoLevel, true>> nullable_keys256_two_level;
/// Support for low cardinality.
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt8, AggregatedDataWithUInt8Key>>> low_cardinality_key8;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt16, AggregatedDataWithUInt16Key>>> low_cardinality_key16;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt32, AggregatedDataWithUInt64Key>>> low_cardinality_key32;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt64, AggregatedDataWithUInt64Key>>> low_cardinality_key64;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodString<AggregatedDataWithStringKey>>> low_cardinality_key_string;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodFixedString<AggregatedDataWithStringKey>>> low_cardinality_key_fixed_string;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt32, AggregatedDataWithUInt64KeyTwoLevel>>> low_cardinality_key32_two_level;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt64, AggregatedDataWithUInt64KeyTwoLevel>>> low_cardinality_key64_two_level;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodString<AggregatedDataWithStringKeyTwoLevel>>> low_cardinality_key_string_two_level;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodFixedString<AggregatedDataWithStringKeyTwoLevel>>> low_cardinality_key_fixed_string_two_level;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt8, AggregatedDataWithNullableUInt8Key>>> low_cardinality_key8;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt16, AggregatedDataWithNullableUInt16Key>>> low_cardinality_key16;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt32, AggregatedDataWithNullableUInt64Key>>> low_cardinality_key32;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt64, AggregatedDataWithNullableUInt64Key>>> low_cardinality_key64;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodString<AggregatedDataWithNullableStringKey>>> low_cardinality_key_string;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodFixedString<AggregatedDataWithNullableStringKey>>> low_cardinality_key_fixed_string;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt32, AggregatedDataWithNullableUInt64KeyTwoLevel>>> low_cardinality_key32_two_level;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodOneNumber<UInt64, AggregatedDataWithNullableUInt64KeyTwoLevel>>> low_cardinality_key64_two_level;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodString<AggregatedDataWithNullableStringKeyTwoLevel>>> low_cardinality_key_string_two_level;
std::unique_ptr<AggregationMethodSingleLowCardinalityColumn<AggregationMethodFixedString<AggregatedDataWithNullableStringKeyTwoLevel>>> low_cardinality_key_fixed_string_two_level;
std::unique_ptr<AggregationMethodKeysFixed<AggregatedDataWithKeys128, false, true>> low_cardinality_keys128;
std::unique_ptr<AggregationMethodKeysFixed<AggregatedDataWithKeys256, false, true>> low_cardinality_keys256;
......@@ -1580,6 +1652,13 @@ public:
Arena * arena) const;
protected:
/// Merge NULL key data from hash table `src` into `dst`.
template <typename Method, typename Table>
void mergeDataNullKey(
Table & table_dst,
Table & table_src,
Arena * arena) const;
/// Merge data from hash table `src` into `dst`.
template <typename Method, typename Table>
void mergeDataImpl(
......
......@@ -590,7 +590,7 @@ void ExpressionActions::checkLimits(Block & block) const
{
std::stringstream list_of_non_const_columns;
for (size_t i = 0, size = block.columns(); i < size; ++i)
if (!block.safeGetByPosition(i).column->isColumnConst())
if (block.safeGetByPosition(i).column && !block.safeGetByPosition(i).column->isColumnConst())
list_of_non_const_columns << "\n" << block.safeGetByPosition(i).name;
throw Exception("Too many temporary non-const columns:" + list_of_non_const_columns.str()
......
......@@ -17,7 +17,7 @@ function thread1()
function thread2()
{
seq 1 1000 | sed -r -e 's/.+/SELECT count() FROM test.buffer;/' | ${CLICKHOUSE_CLIENT} --multiquery --server_logs_file='/dev/null' --ignore-error 2>&1 | grep -vP '^0$|^10$|^Received exception|^Code: 60'
seq 1 1000 | sed -r -e 's/.+/SELECT count() FROM test.buffer;/' | ${CLICKHOUSE_CLIENT} --multiquery --server_logs_file='/dev/null' --ignore-error 2>&1 | grep -vP '^0$|^10$|^Received exception|^Code: 60|^Code: 218'
}
thread1 &
......
......@@ -24,6 +24,8 @@ RUN apt-get update \
/tmp/* \
&& apt-get clean
RUN mkdir /docker-entrypoint-initdb.d
COPY docker_related_config.xml /etc/clickhouse-server/config.d/
COPY entrypoint.sh /entrypoint.sh
ADD https://github.com/tianon/gosu/releases/download/1.10/gosu-amd64 /bin/gosu
......
......@@ -33,6 +33,22 @@ ClickHouse configuration represented with a file "config.xml" ([documentation](h
$ docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 -v /path/to/your/config.xml:/etc/clickhouse-server/config.xml yandex/clickhouse-server
```
## How to extend this image
If you would like to do additional initialization in an image derived from this one, add one or more `*.sql`, `*.sql.gz`, or `*.sh` scripts under `/docker-entrypoint-initdb.d`. After the entrypoint calls `initdb` it will run any `*.sql` files, run any executable `*.sh` scripts, and source any non-executable `*.sh` scripts found in that directory to do further initialization before starting the service.
For example, to add an additional user and database, add the following to `/docker-entrypoint-initdb.d/init-db.sh`:
```bash
#!/bin/bash
set -e
clickhouse client -n <<-EOSQL
CREATE DATABASE docker;
CREATE TABLE docker.docker (x Int32) ENGINE = Log;
EOSQL
```
## License
View [license information](https://github.com/yandex/ClickHouse/blob/master/LICENSE) for the software contained in this image.
......@@ -30,6 +30,36 @@ chown -R $USER:$GROUP \
"$TMP_DIR" \
"$USER_PATH"
if [ -n "$(ls /docker-entrypoint-initdb.d/)" ]; then
gosu clickhouse /usr/bin/clickhouse-server --config-file=$CLICKHOUSE_CONFIG &
pid="$!"
sleep 1
clickhouseclient=( clickhouse client --multiquery )
echo
for f in /docker-entrypoint-initdb.d/*; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo "$0: running $f"
"$f"
else
echo "$0: sourcing $f"
. "$f"
fi
;;
*.sql) echo "$0: running $f"; cat "$f" | "${clickhouseclient[@]}" ; echo ;;
*.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${clickhouseclient[@]}"; echo ;;
*) echo "$0: ignoring $f" ;;
esac
echo
done
if ! kill -s TERM "$pid" || ! wait "$pid"; then
echo >&2 'ClickHouse init process failed.'
exit 1
fi
fi
# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
......
......@@ -22,9 +22,8 @@ cd ClickHouse
# How to Build ClickHouse for Development
Build should work on Ubuntu Linux.
The following tutorial is based on the Ubuntu Linux system.
With appropriate changes, it should also work on any other Linux distribution.
The build process is not intended to work on Mac OS X.
Only x86_64 with SSE 4.2 is supported. Support for AArch64 is experimental.
To test for SSE 4.2, do
......
......@@ -39,6 +39,7 @@ You can also download and install packages manually from here: <https://repo.yan
Yandex does not run ClickHouse on `rpm` based Linux distributions and `rpm` packages are not as thoroughly tested. So use them at your own risk, but there are many other companies that do successfully run them in production without any major issues.
For CentOS, RHEL or Fedora there are the following options:
* Packages from <https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/> are generated from official `deb` packages by Yandex and have byte-identical binaries.
* Packages from <https://github.com/Altinity/clickhouse-rpm-install> are built by independent company Altinity, but are used widely without any complaints.
* Or you can use Docker (see below).
......
# Visual Interfaces from Third-party Developers
## Tabix
## Open-Source
### Tabix
Web interface for ClickHouse in the [Tabix](https://github.com/tabixio/tabix) project.
Main features:
Features:
- Works with ClickHouse directly from the browser, without the need to install additional software.
- Query editor with syntax highlighting.
......@@ -14,11 +16,11 @@ Main features:
[Tabix documentation](https://tabix.io/doc/).
## HouseOps
### HouseOps
[HouseOps](https://github.com/HouseOps/HouseOps) is a UI/IDE for OSX, Linux and Windows.
Main features:
Features:
- Query builder with syntax highlighting. View the response in a table or JSON view.
- Export query results as CSV or JSON.
......@@ -36,25 +38,27 @@ The following features are planned for development:
- Cluster management.
- Monitoring replicated and Kafka tables.
## DBeaver
## Commercial
### DBeaver
[DBeaver](https://dbeaver.io/) - universal desktop database client with ClickHouse support.
Key features:
Features:
- Query development with syntax highlight.
- Table preview.
- Autocompletion.
## DataGrip
### DataGrip
[DataGrip](https://www.jetbrains.com/datagrip/) - Database IDE from JetBrains with dedicated support for ClickHouse. The same is embedded into other IntelliJ-based tools: PyCharm, IntelliJIDEA, GoLand, PhpStorm etc.
[DataGrip](https://www.jetbrains.com/datagrip/) is a database IDE from JetBrains with dedicated support for ClickHouse. It is also embedded into other IntelliJ-based tools: PyCharm, IntelliJ IDEA, GoLand, PhpStorm and others.
Features:
- Very fast code completion.
- Clickhouse synthax highlighting.
- Specific Clickhouse features support in SQL, i.e. nested columns, table engines.
- ClickHouse syntax highlighting.
- Support for features specific to ClickHouse, for example nested columns, table engines.
- Data Editor.
- Refactorings.
- Search and Navigation.
......
......@@ -3,6 +3,11 @@
!!! warning "Disclaimer"
Yandex does **not** maintain the libraries listed below and haven't done any extensive testing to ensure their quality.
- Relational database management systems
- [MySQL](https://www.mysql.com)
- [ProxySQL](https://github.com/sysown/proxysql/wiki/ClickHouse-Support)
- [PostgreSQL](https://www.postgresql.org)
- [infi.clickhouse_fdw](https://github.com/Infinidat/infi.clickhouse_fdw) (uses [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- Python
- [SQLAlchemy](https://www.sqlalchemy.org)
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse) (uses [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
......@@ -22,4 +27,4 @@
- [clickhouse_ecto](https://github.com/appodeal/clickhouse_ecto)
[Original article](https://clickhouse.yandex/docs/en/interfaces/third-party/integrations/) <!--hide-->
\ No newline at end of file
[Original article](https://clickhouse.yandex/docs/en/interfaces/third-party/integrations/) <!--hide-->
......@@ -14,7 +14,7 @@ Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression
## Disk Storage of Data
Mving a data physically sorted by primary key makes it possible to extract data for it's specific values or value ranges with low latency, less than few dozen milliseconds. some column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach encourages the allocation of a larger hardware budget than is actually necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available.
Keeping data physically sorted by primary key makes it possible to extract data for it's specific values or value ranges with low latency, less than few dozen milliseconds. Some column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach encourages the allocation of a larger hardware budget than is actually necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available.
## Parallel Processing on Multiple Cores
......
......@@ -54,8 +54,6 @@ There are several processing stages:
Only the first stage takes time. If there is a failure at this stage, the data is not changed.
If there is a failure during one of the successive stages, data can be restored manually. The exception is if the old files were deleted from the file system but the data for the new files did not get written to the disk and was lost.
There is no support for changing the column type in arrays and nested data structures.
The `ALTER` query lets you create and delete separate elements (columns) in nested data structures, but not whole nested data structures. To add a nested data structure, you can add columns with a name like `name.nested_name` and the type `Array(T)`. A nested data structure is equivalent to multiple array columns with a name that has the same prefix before the dot.
There is no support for deleting columns in the primary key or the sampling key (columns that are in the `ENGINE` expression). Changing the type for columns that are included in the primary key is only possible if this change does not cause the data to be modified (for example, it is allowed to add values to an Enum or change a type with `DateTime` to `UInt32`).
......
......@@ -125,10 +125,6 @@ CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] [ENGINE
Creates a view. There are two types of views: normal and MATERIALIZED.
When creating a materialized view, you must specify ENGINE – the table engine for storing data.
A materialized view works as follows: when inserting data to the table specified in SELECT, part of the inserted data is converted by this SELECT query, and the result is inserted in the view.
Normal views don't store any data, but just perform a read from another table. In other words, a normal view is nothing more than a saved query. When reading from a view, this saved query is used as a subquery in the FROM clause.
As an example, assume you've created a view:
......
......@@ -48,7 +48,7 @@ The FINAL modifier can be used only for a SELECT from a CollapsingMergeTree tabl
The SAMPLE clause allows for approximated query processing. Approximated query processing is only supported by MergeTree\* type tables, and only if the sampling expression was specified during table creation (see the section "MergeTree engine").
`SAMPLE` has the `format SAMPLE k`, where `k` is a decimal number from 0 to 1, or `SAMPLE n`, where 'n' is a sufficiently large integer.
`SAMPLE` has the `SAMPLE k`, where `k` is a decimal number from 0 to 1, or `SAMPLE n`, where 'n' is a sufficiently large integer.
In the first case, the query will be executed on 'k' percent of data. For example, `SAMPLE 0.1` runs the query on 10% of data.
In the second case, the query will be executed on a sample of no more than 'n' rows. For example, `SAMPLE 10000000` runs the query on a maximum of 10,000,000 rows.
......@@ -437,18 +437,20 @@ Only one `JOIN` can be specified in a query (on a single level). To run multiple
Each time a query is run with the same `JOIN`, the subquery is run again – the result is not cached. To avoid this, use the special 'Join' table engine, which is a prepared array for joining that is always in RAM. For more information, see the section "Table engines, Join".
In some cases, it is more efficient to use `IN` instead of `JOIN`.
Among the various types of `JOIN`, the most efficient is ANY `LEFT JOIN`, then `ANY INNER JOIN`. The least efficient are `ALL LEFT JOIN` and `ALL INNER JOIN`.
Among the various types of `JOIN`, the most efficient is `ANY LEFT JOIN`, then `ANY INNER JOIN`. The least efficient are `ALL LEFT JOIN` and `ALL INNER JOIN`.
If you need a `JOIN` for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a `JOIN` might not be very convenient due to the bulky syntax and the fact that the right table is re-accessed for every query. For such cases, there is an "external dictionaries" feature that you should use instead of `JOIN`. For more information, see the section [External dictionaries](dicts/external_dicts.md#dicts-external_dicts).
<a name="query_language-queries-where"></a>
### WHERE Clause
#### NULL processing
The JOIN behavior is affected by the [join_use_nulls](../operations/settings/settings.md#settings-join_use_nulls) setting. With `join_use_nulls=1,` `JOIN` works like in standard SQL.
The JOIN behavior is affected by the [join_use_nulls](../operations/settings/settings.md#settings-join_use_nulls) setting. With `join_use_nulls=1`, `JOIN` works like in standard SQL.
If the JOIN keys are [Nullable](../data_types/nullable.md#data_types-nullable) fields, the rows where at least one of the keys has the value [NULL](syntax.md#null-literal) are not joined.
<a name="query_language-queries-where"></a>
### WHERE Clause
If there is a WHERE clause, it must contain an expression with the UInt8 type. This is usually an expression with comparison and logical operators.
This expression will be used for filtering data before all other transformations.
......@@ -696,6 +698,8 @@ The result will be the same as if GROUP BY were specified across all the fields
DISTINCT is not supported if SELECT has at least one array column.
`DISTINCT` works with [NULL](syntax.md#null-literal) as if `NULL` were a specific value, and `NULL=NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` only occur once.
### LIMIT Clause
LIMIT m allows you to select the first 'm' rows from the result.
......@@ -705,8 +709,6 @@ LIMIT n, m allows you to select the first 'm' rows from the result after skippin
If there isn't an ORDER BY clause that explicitly sorts results, the result may be arbitrary and nondeterministic.
`DISTINCT` works with [NULL](syntax.md#null-literal) as if `NULL` were a specific value, and `NULL=NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` only occur once.
### UNION ALL Clause
You can use UNION ALL to combine any number of queries. Example:
......@@ -852,7 +854,7 @@ FROM t_null
#### Distributed Subqueries
There are two options for IN-s with subqueries (similar to JOINs): normal `IN` / ` OIN` and `IN GLOBAL` / `GLOBAL JOIN`. They differ in how they are run for distributed query processing.
There are two options for IN-s with subqueries (similar to JOINs): normal `IN` / `JOIN` and `GLOBAL IN` / `GLOBAL JOIN`. They differ in how they are run for distributed query processing.
!!! attention
Remember that the algorithms described below may work differently depending on the [settings](../operations/settings/settings.md#settings-distributed_product_mode) `distributed_product_mode` setting.
......
......@@ -2,11 +2,14 @@
# interface های visual توسعه دهندگان third-party
## Tabix
## متن باز
### Tabix
interface تحت وب برای ClickHouse در پروژه [Tabix](https://github.com/tabixio/tabix).
### ویژگی ها:
ویژگی ها:
- کار با ClickHouse به صورت مستقیم و از طریق مرورگر، بدون نیاز به نرم افزار اضافی.
- ادیتور query به همراه syntax highlighting.
- ویژگی Auto-completion برای دستورات.
......@@ -16,11 +19,12 @@ interface تحت وب برای ClickHouse در پروژه [Tabix](https://github
[مستندات Tabix](https://tabix.io/doc/).
## HouseOps
### HouseOps
[HouseOps](https://github.com/HouseOps/HouseOps) نرم افزار Desktop برای سیستم عامل های Linux و OSX و Windows می باشد..
### ویژگی ها:
ویژگی ها:
- ابزار Query builder به همراه syntax highlighting. نمایش نتایج به صورت جدول و JSON Object.
- خروجی نتایج به صورت csv و JSON Object.
- Pنمایش Processes List ها به همراه توضیحات، ویژگی حالت record و kill کردن process ها.
......@@ -35,5 +39,30 @@ interface تحت وب برای ClickHouse در پروژه [Tabix](https://github
- مانیتورینگ کافکا و جداول replicate (بزودی);
- و بسیاری از ویژگی های دیگر برای شما.
## تجاری
### DBeaver
[DBeaver](https://dbeaver.io/) - مشتری دسکتاپ دسکتاپ دسکتاپ با پشتیبانی ClickHouse.
امکانات:
- توسعه پرس و جو با برجسته نحو
- پیش نمایش جدول
- تکمیل خودکار
### DataGrip
[DataGrip](https://www.jetbrains.com/datagrip/) IDE پایگاه داده از JetBrains با پشتیبانی اختصاصی برای ClickHouse است. این ابزار همچنین به سایر ابزارهای مبتنی بر IntelliJ تعبیه شده است: PyCharm، IntelliJ IDEA، GoLand، PhpStorm و دیگران.
امکانات:
- تکمیل کد بسیار سریع
- نحو برجسته ClickHouse.
- پشتیبانی از ویژگی های خاص برای ClickHouse، برای مثال ستون های توپی، موتورهای جدول.
- ویرایشگر داده.
- Refactorings.
- جستجو و ناوبری
</div>
[مقاله اصلی](https://clickhouse.yandex/docs/fa/interfaces/third-party_gui/) <!--hide-->
......@@ -5,12 +5,17 @@
!!! warning "سلب مسئولیت"
Yandex نه حفظ کتابخانه ها در زیر ذکر شده و نشده انجام هر آزمایش های گسترده ای برای اطمینان از کیفیت آنها.
- سیستم های مدیریت پایگاه داده رابطه ای
- [MySQL](https://www.mysql.com)
- [ProxySQL](https://github.com/sysown/proxysql/wiki/ClickHouse-Support)
- [PostgreSQL](https://www.postgresql.org)
- [infi.clickhouse_fdw](https://github.com/Infinidat/infi.clickhouse_fdw) (استفاده می کند [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- Python
- [SQLAlchemy](https://www.sqlalchemy.org)
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse) (uses [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse) (استفاده می کند [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- Java
- [Hadoop](http://hadoop.apache.org)
- [clickhouse-hdfs-loader](https://github.com/jaykelin/clickhouse-hdfs-loader) (uses [JDBC](../jdbc.md))
- [clickhouse-hdfs-loader](https://github.com/jaykelin/clickhouse-hdfs-loader) (استفاده می کند [JDBC](../jdbc.md))
- Scala
- [Akka](https://akka.io)
- [clickhouse-scala-client](https://github.com/crobox/clickhouse-scala-client)
......
......@@ -25,7 +25,7 @@
## Внутреннее представление
Внутри данные представляются как знаковые целые числа, соответсвующей разрядности. Реальные диапазоны, хранящиеся в ячейках памяти несколько больше заявленных. Заявленные диапазоны Decimal проверяются только при вводе числа из строкового представления.
Поскольку современные CPU не поддежривают 128-битные числа, операции над Decimal128 эмулируются программно. Decimal128 работает в разы медленней чем Decimal32/Decimal64.
Поскольку современные CPU не поддерживают 128-битные числа, операции над Decimal128 эмулируются программно. Decimal128 работает в разы медленней чем Decimal32/Decimal64.
## Операции и типы результата
......
......@@ -4,7 +4,7 @@
ClickHouse может работать на любом Linux, FreeBSD или Mac OS X с архитектурой процессора x86\_64.
Хотя предсобранные релизы обычно компилируются с использованием набора инструкций SSE 4.2, что добавляет использование поддерживающего его процессора в список системных требований. Команда для п проверки наличия поддержки инструкций SSE 4.2 на текущем процессоре:
Хотя предсобранные релизы обычно компилируются с использованием набора инструкций SSE 4.2, что добавляет использование поддерживающего его процессора в список системных требований. Команда для проверки наличия поддержки инструкций SSE 4.2 на текущем процессоре:
```bash
$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
......@@ -39,6 +39,7 @@ sudo apt-get install clickhouse-client clickhouse-server
Яндекс не использует ClickHouse на поддерживающих `rpm` дистрибутивах Linux, а также `rpm` пакеты менее тщательно тестируются. Таким образом, использовать их стоит на свой страх и риск, но, тем не менее, многие другие компании успешно работают на них в production без каких-либо серьезных проблем.
Для CentOS, RHEL и Fedora возможны следующие варианты:
* Пакеты из <https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/> генерируются на основе официальных `deb` пакетов от Яндекса и содержат в точности тот же исполняемый файл.
* Пакеты из <https://github.com/Altinity/clickhouse-rpm-install> собираются независимой компанией Altinity, но широко используются без каких-либо нареканий.
* Либо можно использовать Docker (см. ниже).
......
# Визуальные интерфейсы от сторонних разработчиков
## Tabix
## С открытым исходным кодом
### Tabix
Веб-интерфейс для ClickHouse в проекте [Tabix](https://github.com/tabixio/tabix).
......@@ -14,7 +16,7 @@
[Документация Tabix](https://tabix.io/doc/).
## HouseOps
### HouseOps
[HouseOps](https://github.com/HouseOps/HouseOps) — UI/IDE для OSX, Linux и Windows.
......@@ -39,7 +41,9 @@
- Управление кластером;
- Мониторинг реплицированных и Kafka таблиц.
## DBeaver
## Коммерческие
### DBeaver
[DBeaver](https://dbeaver.io/) - универсальный desktop клиент баз данных с поддержкой ClickHouse.
......@@ -49,4 +53,17 @@
- Просмотр таблиц;
- Автодополнение команд.
### DataGrip
[DataGrip](https://www.jetbrains.com/datagrip/) — это IDE для баз данных о JetBrains с выделенной поддержкой ClickHouse. Он также встроен в другие инструменты на основе IntelliJ: PyCharm, IntelliJ IDEA, GoLand, PhpStorm и другие.
Основные возможности:
- Очень быстрое дополнение кода.
- Подсветка синтаксиса для SQL диалекта ClickHouse.
- Поддержка функций, специфичных для ClickHouse, например вложенных столбцов, движков таблиц.
- Редактор данных.
- Рефакторинги.
- Поиск и навигация.
[Оригинальная статья](https://clickhouse.yandex/docs/ru/interfaces/third-party_gui/) <!--hide-->
......@@ -3,12 +3,17 @@
!!! warning "Disclaimer"
Яндекс не поддерживает перечисленные ниже библиотеки и не проводит тщательного тестирования для проверки их качества.
- Реляционные системы управления базами данных
- [MySQL](https://www.mysql.com)
- [ProxySQL](https://github.com/sysown/proxysql/wiki/ClickHouse-Support)
- [PostgreSQL](https://www.postgresql.org)
- [infi.clickhouse_fdw](https://github.com/Infinidat/infi.clickhouse_fdw) (использует [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- Python
- [SQLAlchemy](https://www.sqlalchemy.org)
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse) (uses [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse) (использует [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- Java
- [Hadoop](http://hadoop.apache.org)
- [clickhouse-hdfs-loader](https://github.com/jaykelin/clickhouse-hdfs-loader) (uses [JDBC](../jdbc.md))
- [clickhouse-hdfs-loader](https://github.com/jaykelin/clickhouse-hdfs-loader) (использует [JDBC](../jdbc.md))
- Scala
- [Akka](https://akka.io)
- [clickhouse-scala-client](https://github.com/crobox/clickhouse-scala-client)
......
......@@ -141,7 +141,7 @@ default_expression String - выражение для значения по ум
- marks_size (UInt64) - Размер файла с засечками.
- rows (UInt64) - Количество строк.
- bytes (UInt64) - Количество байт в сжатом виде.
- modification_time (DateTime) - Время модификации директории с куском. Обычно соответствует времени создания куска.|
- modification_time (DateTime) - Время модификации директории с куском. Обычно соответствует времени создания куска.
- remove_time (DateTime) - Время, когда кусок стал неактивным.
- refcount (UInt32) - Количество мест, в котором кусок используется. Значение больше 2 говорит о том, что кусок участвует в запросах или в слияниях.
- min_date (Date) - Минимальное значение ключа даты в куске.
......
......@@ -14,7 +14,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = MergeTree()
) ENGINE = SummingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
......
......@@ -714,6 +714,8 @@ WHERE и HAVING отличаются тем, что WHERE выполняется
`DISTINCT` работает с [NULL](syntax.md#null-literal) как если бы `NULL` был конкретным значением, причём `NULL=NULL`. Т.е. в результате `DISTINCT` разные комбинации с `NULL` встретятся только по одному разу.
`DISTINCT` работает с [NULL](syntax.md#null-literal) как если бы `NULL` был конкретным значением, причём `NULL=NULL`. Т.е. в результате `DISTINCT` разные комбинации с `NULL` встретятся только по одному разу.
### Секция LIMIT
LIMIT m позволяет выбрать из результата первые m строк.
......@@ -723,8 +725,6 @@ n и m должны быть неотрицательными целыми чи
При отсутствии секции ORDER BY, однозначно сортирующей результат, результат может быть произвольным и может являться недетерминированным.
`DISTINCT` работает с [NULL](syntax.md#null-literal) как если бы `NULL` был конкретным значением, причём `NULL=NULL`. Т.е. в результате `DISTINCT` разные комбинации с `NULL` встретятся только по одному разу.
### Секция UNION ALL
Произвольное количество запросов может быть объединено с помощью `UNION ALL`. Пример:
......
......@@ -2,20 +2,46 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import markdown.inlinepatterns
import markdown.extensions
import markdown.util
class NofollowMixin(object):
def handleMatch(self, m):
try:
el = super(NofollowMixin, self).handleMatch(m)
except IndexError:
return
if el is not None:
href = el.get('href') or ''
if href.startswith('http') and not href.startswith('https://clickhouse.yandex'):
el.set('rel', 'external nofollow')
return el
class NofollowAutolinkPattern(NofollowMixin, markdown.inlinepatterns.AutolinkPattern):
pass
class NofollowLinkPattern(NofollowMixin, markdown.inlinepatterns.LinkPattern):
pass
class ClickHousePreprocessor(markdown.util.Processor):
def run(self, lines):
for line in lines:
if '<!--hide-->' not in line:
yield line
class ClickHouseMarkdown(markdown.extensions.Extension):
def extendMarkdown(self, md, md_globals):
md.preprocessors['clickhouse'] = ClickHousePreprocessor()
md.inlinePatterns['link'] = NofollowLinkPattern(markdown.inlinepatterns.LINK_RE, md)
md.inlinePatterns['autolink'] = NofollowAutolinkPattern(markdown.inlinepatterns.AUTOLINK_RE, md)
def makeExtension(**kwargs):
......
......@@ -136,10 +136,10 @@
<article class="md-content__inner md-typeset">
{% block content %}
{% if config.extra.single_page %}
<a href="{{ config.repo_url }}tree/master/docs" title="{{ lang.t('edit.link.title') }}" class="md-icon md-content__icon">&#xE3C9;</a>
<a href="{{ config.repo_url }}tree/master/docs" title="{{ lang.t('edit.link.title') }}" class="md-icon md-content__icon" rel="external nofollow">&#xE3C9;</a>
{% else %}
{% if page.edit_url %}
<a href="{{ page.edit_url }}" title="{{ lang.t('edit.link.title') }}" class="md-icon md-content__icon">&#xE3C9;</a>
<a href="{{ page.edit_url }}" title="{{ lang.t('edit.link.title') }}" class="md-icon md-content__icon" rel="external nofollow">&#xE3C9;</a>
{% endif %}
{% endif %}
{% if not "\x3ch1" in page.content %}
......@@ -155,7 +155,7 @@
<h2 id="__source">{{ lang.t("meta.source") }}</h2>
{% set path = page.meta.path | default([""]) %}
{% set file = page.meta.source %}
<a href="{{ [config.repo_url, path, file] | join('/') }}" title="{{ file }}" class="md-source-file">
<a href="{{ [config.repo_url, path, file] | join('/') }}" title="{{ file }}" class="md-source-file" rel="external nofollow">
{{ file }}
</a>
{% endif %}
......
......@@ -40,7 +40,7 @@
</label>
{% endif %}
<a href="{{ base_url }}/{{ nav_item.url }}" title="{{ nav_item.title }}" class="md-nav__link md-nav__link--active">
{{ nav_item.title }}
<strong>{{ nav_item.title }}</strong>
</a>
{% if toc_ | first is defined %}
{% include "partials/toc.html" %}
......
../../en/development/build.md
\ No newline at end of file
# 如何构建 ClickHouse 发布包
## 安装 Git 和 Pbuilder
```bash
sudo apt-get update
sudo apt-get install git pbuilder debhelper lsb-release fakeroot sudo debian-archive-keyring debian-keyring
```
## 拉取 ClickHouse 源码
```bash
git clone --recursive --branch stable https://github.com/yandex/ClickHouse.git
cd ClickHouse
```
## 运行发布脚本
```bash
./release
```
# 如何在开发过程中编译 ClickHouse
以下教程是在 Ubuntu Linux 中进行编译的示例。
通过适当的更改,它应该可以适用于任何其他的 Linux 发行版。
仅支持具有 SSE 4.2的 x86_64。 对 AArch64 的支持是实验性的。
测试是否支持 SSE 4.2,执行:
```bash
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
```
## 安装 Git 和 CMake
```bash
sudo apt-get install git cmake ninja-build
```
Or cmake3 instead of cmake on older systems.
或者在早期版本的系统中用 cmake3 替代 cmake
## 安装 GCC 7
There are several ways to do this.
### 安装 PPA 包
```bash
sudo apt-get install software-properties-common
sudo apt-add-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-7 g++-7
```
### 源码安装 gcc
请查看 [ci/build-gcc-from-sources.sh](https://github.com/yandex/ClickHouse/blob/master/ci/build-gcc-from-sources.sh)
## 使用 GCC 7 来编译
```bash
export CC=gcc-7
export CXX=g++-7
```
## 安装所需的工具依赖库
```bash
sudo apt-get install libicu-dev libreadline-dev
```
## 拉取 ClickHouse 源码
```bash
git clone --recursive git@github.com:yandex/ClickHouse.git
# or: git clone --recursive https://github.com/yandex/ClickHouse.git
cd ClickHouse
```
For the latest stable version, switch to the `stable` branch.
## 编译 ClickHouse
```bash
mkdir build
cd build
cmake ..
ninja
cd ..
```
若要创建一个执行文件, 执行 `ninja clickhouse`
这个命令会使得 `dbms/programs/clickhouse` 文件可执行,您可以使用 `client` or `server` 参数运行。
[来源文章](https://clickhouse.yandex/docs/en/development/build/) <!--hide-->
../../en/development/build_osx.md
\ No newline at end of file
# 在 Mac OS X 中编译 ClickHouse
ClickHouse 支持在 Mac OS X 10.12 版本中编译。若您在用更早的操作系统版本,可以尝试在指令中使用 `Gentoo Prefix``clang sl`.
通过适当的更改,它应该可以适用于任何其他的 Linux 发行版。
## 安装 Homebrew
```bash
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```
## 安装编译器,工具库
```bash
brew install cmake ninja gcc icu4c mariadb-connector-c openssl libtool gettext readline
```
## 拉取 ClickHouse 源码
```bash
git clone --recursive git@github.com:yandex/ClickHouse.git
# or: git clone --recursive https://github.com/yandex/ClickHouse.git
cd ClickHouse
```
For the latest stable version, switch to the `stable` branch.
## 编译 ClickHouse
```bash
mkdir build
cd build
cmake .. -DCMAKE_CXX_COMPILER=`which g++-8` -DCMAKE_C_COMPILER=`which gcc-8`
ninja
cd ..
```
## 注意事项
若你想运行 clickhouse-server,请先确保增加系统的最大文件数配置。
!!! 注意
可能需要用 sudo
为此,请创建以下文件:
/Library/LaunchDaemons/limit.maxfiles.plist:
``` xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>limit.maxfiles</string>
<key>ProgramArguments</key>
<array>
<string>launchctl</string>
<string>limit</string>
<string>maxfiles</string>
<string>524288</string>
<string>524288</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>ServiceIPC</key>
<false/>
</dict>
</plist>
```
执行以下命令:
``` bash
$ sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist
```
然后重启。
可以通过 `ulimit -n` 命令来检查是否生效。
[来源文章](https://clickhouse.yandex/docs/en/development/build_osx/) <!--hide-->
../../en/development/index.md
\ No newline at end of file
# ClickHouse 开发
[来源文章](https://clickhouse.yandex/docs/en/development/) <!--hide-->
../../en/development/style.md
\ No newline at end of file
此差异已折叠。
# 第三方开发的可视化界面
## Tabix
## 开源
### Tabix
ClickHouse Web 界面 [Tabix](https://github.com/tabixio/tabix).
......@@ -15,7 +17,7 @@ ClickHouse Web 界面 [Tabix](https://github.com/tabixio/tabix).
[Tabix 文档](https://tabix.io/doc/).
## HouseOps
### HouseOps
[HouseOps](https://github.com/HouseOps/HouseOps) 是一个交互式 UI/IDE 工具,可以运行在 OSX, Linux and Windows 平台中。
......@@ -36,5 +38,29 @@ ClickHouse Web 界面 [Tabix](https://github.com/tabixio/tabix).
- 集群管理
- 监控副本情况以及 Kafka 引擎表
## 商业
### DBeaver
[DBeaver](https://dbeaver.io/) - 具有ClickHouse支持的通用桌面数据库客户端。
特征:
- 使用语法高亮显示查询开发。
- 表格预览。
- 自动完成。
### DataGrip
[DataGrip](https://www.jetbrains.com/datagrip/)是JetBrains的数据库IDE,专门支持ClickHouse。 它还嵌入到其他基于IntelliJ的工具中:PyCharm,IntelliJ IDEA,GoLand,PhpStorm等。
特征:
- 非常快速的代码完成。
- ClickHouse语法高亮显示。
- 支持ClickHouse特有的功能,例如嵌套列,表引擎。
- 数据编辑器。
- 重构。
- 搜索和导航。
[来源文章](https://clickhouse.yandex/docs/zh/interfaces/third-party_gui/) <!--hide-->
# 第三方集成库
!!! warning "放弃"
!!! warning "声明"
Yandex不维护下面列出的库,也没有进行任何广泛的测试以确保其质量。
- 关系数据库管理系统
- [MySQL](https://www.mysql.com)
- [ProxySQL](https://github.com/sysown/proxysql/wiki/ClickHouse-Support)
- [PostgreSQL](https://www.postgresql.org)
- [infi.clickhouse_fdw](https://github.com/Infinidat/infi.clickhouse_fdw) (使用 [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- Python
- [SQLAlchemy](https://www.sqlalchemy.org)
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse) (uses [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse) (使用 [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm))
- Java
- [Hadoop](http://hadoop.apache.org)
- [clickhouse-hdfs-loader](https://github.com/jaykelin/clickhouse-hdfs-loader) (uses [JDBC](../jdbc.md))
- [clickhouse-hdfs-loader](https://github.com/jaykelin/clickhouse-hdfs-loader) (使用 [JDBC](../jdbc.md))
- Scala
- [Akka](https://akka.io)
- [clickhouse-scala-client](https://github.com/crobox/clickhouse-scala-client)
......
## 创建数据库
## CREATE DATABASE
创建 `db_name` 数据库。
该查询用于根据指定名称创建数据库。
```sql
``` sql
CREATE DATABASE [IF NOT EXISTS] db_name
```
数据库是一个包含多个表的目录,如果在CREATE DATABASE语句中包含`IF NOT EXISTS`,则在数据库已经存在的情况下查询也不会返回错误。
数据库其实只是用于存放表的一个目录。
如果查询中存在`IF NOT EXISTS`,则当数据库已经存在时,该查询不会返回任何错误。
<a name="query_language-queries-create_table"></a>
## 创建表
## CREATE TABLE
`CREATE TABLE` 语句有几种形式.
对于`CREATE TABLE`,存在以下几种方式。
```sql
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1]
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2]
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = engine
```
如果`db`没有设置, 在数据库`db`中或者当前数据库中, 创建一个表名为`name`的表, 在括号和`engine` 引擎中指定结构。 表的结构是一个列描述的列表。 如果引擎支持索引, 则他们将是表引擎的参数。
表结构是一个列描述的列表。 如果引擎支持索引, 他们以表引擎的参数表示。
在指定的‘db’数据库中创建一个名为‘name’的表,如果查询中没有包含‘db’,则默认使用当前选择的数据库作为‘db’。后面的是包含在括号中的表结构以及表引擎的声明。
其中表结构声明是一个包含一组列描述声明的组合。如果表引擎是支持索引的,那么可以在表引擎的参数中对其进行说明。
在最简单的情况, 一个列描述是'命名类型'。 例如: RegionID UInt32。 对于默认值, 表达式也能够被定义。
在最简单的情况下,列描述是指`名称 类型`这样的子句。例如: `RegionID UInt32`
但是也可以为列另外定义默认值表达式(见后文)。
```sql
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name AS [db2.]name2 [ENGINE = engine]
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name AS [db2.]name2 [ENGINE = engine]
```
创建一个表, 其结构与另一个表相同。 你能够为此表指定一个不同的引擎。 如果引擎没有被指定, 相同的引擎将被用于`db2。name2`表上
创建一个`db2.name2`具有相同结构的表,同时你可以对其指定不同的表引擎声明。如果没有表引擎声明,则创建的表将与`db2.name2`使用相同的表引擎
```sql
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name ENGINE = engine AS SELECT ...
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name ENGINE = engine AS SELECT ...
```
创建一个表,其结构类似于 SELECT 查询后的结果, 带有`engine` 引擎, 从 SELECT查询数据填充它。
使用指定的引擎创建一个与`SELECT`子句的结果具有相同结构的表,并使用`SELECT`子句的结果填充它。
以上所有情况,如果指定了`IF NOT EXISTS`,那么在该表已经存在的情况下,查询不会返回任何错误。在这种情况下,查询几乎不会做任何事情。
所有情况下,如果`IF NOT EXISTS`被指定, 如果表已经存在, 查询并不返回一个错误。 在这种情况下, 查询并不做任何事情
`ENGINE`子句后还可能存在一些其他的子句,更详细的信息可以参考[表引擎](../operations/table_engines/index.md#table_engines)中关于建表的描述
### 默认值
列描述能够为默认值指定一个表达式, 其中一个方法是:DEFAULT expr, MATERIALIZED expr, ALIAS expr。
例如: URLDomain String DEFAULT domain(URL)
在列描述中你可以通过以下方式之一为列指定默认表达式:`DEFAULT expr``MATERIALIZED expr``ALIAS expr`
示例:`URLDomain String DEFAULT domain(URL)`
如果默认值的一个表达式没有定义, 如果字段是数字类型, 默认值是将设置为0, 如果是字符类型, 则设置为空字符串, 日期类型则设置为 0000-00-00 或者 0000-00-00 00:00:00(时间戳)。 NULLs 则不支持
如果在列描述中未定义任何默认表达式,那么系统将会根据类型设置对应的默认值,如:数值类型为零、字符串类型为空字符串、数组类型为空数组、日期类型为‘0000-00-00’以及时间类型为‘0000-00-00 00:00:00’。不支持使用NULL作为普通类型的默认值
如果默认表达式被定义, 字段类型是可选的。 如果没有明确的定义类型, 则将使用默认表达式。 例如: EventDate DEFAULT toDate(EventTime) – `Date` 类型将用于 `EventDate` 字段
如果定义了默认表达式,则可以不定义列的类型。如果没有明确的定义类的类型,则使用默认表达式的类型。例如:`EventDate DEFAULT toDate(EventTime)` - 最终‘EventDate’将使用‘Date’作为类型
如果数据类型和默认表达式被明确定义, 此表达式将使用函数被转换为特定的类型。 例如: Hits UInt32 DEFAULT 0 与 Hits UInt32 DEFAULT toUInt32(0)是等价的
如果同时指定了默认表达式与列的类型,则将使用类型转换函数将默认表达式转换为指定的类型。例如:`Hits UInt32 DEFAULT 0``Hits UInt32 DEFAULT toUInt32(0)`意思相同
默认表达是可能被定义为一个任意的表达式,如表的常量和字段。 当创建和更改表结构时, 它将检查表达式是否包含循环。 对于 INSERT操作来说, 它将检查表达式是否可解析 – 所有的字段通过传参后进行计算
默认表达式可以包含常量或表的任意其他列。当创建或更改表结构时,系统将会运行检查,确保不会包含循环依赖。对于INSERT, 它仅检查表达式是否是可以解析的 - 它们可以从中计算出所有需要的列的默认值
`DEFAULT expr`
正常的默认值。 如果 INSERT 查询并没有指定对应的字段, 它将通过计算对应的表达式来填充。
`物化表达式`
普通的默认值,如果INSERT中不包含指定的列,那么将通过表达式计算它的默认值并填充它。
物化表达式。 此类型字段并没有指定插入操作, 因为它经常执行计算任务。 对一个插入操作, 无字段列表, 那么这些字段将不考虑。 另外, 当在一个SELECT查询语句中使用星号时, 此字段并不被替换。 这将保证INSERT INTO SELECT * FROM 的不可变性。
`MATERIALIZED expr`
`别名表达式`
物化表达式,被该表达式指定的列不能包含在INSERT的列表中,因为它总是被计算出来的。
对于INSERT而言,不需要考虑这些列。
另外,在SELECT查询中如果包含星号,此列不会被用来替换星号,这是因为考虑到数据转储,在使用`SELECT *`查询出的结果总能够被'INSERT'回表。
别名。 此字段不存储在表中。
此列的值不插入到表中, 当在一个SELECT查询语句中使用星号时,此字段并不被替换。
它能够用在 SELECTs中,如果别名在查询解析时被扩展。
`ALIAS expr`
当使用更新查询添加一个新的字段, 这些列的旧值不被写入。 相反, 新字段没有值,当读取旧值时, 表达式将被计算。 然而,如果运行表达式需要不同的字段, 这些字段将被读取 , 但是仅读取相关的数据块。
别名。这样的列不会存储在表中。
它的值不能够通过INSERT写入,同时使用SELECT查询星号时,这些列也不会被用来替换星号。
但是它们可以显示的用于SELECT中,在这种情况下,在查询分析中别名将被替换。
如果你添加一个新的字段到表中, 然后改变它的默认表达式, 对于使用的旧值将更改(对于此数据, 值不保存在磁盘上)。 当运行背景线程时, 缺少合并数据块的字段数据写入到合并数据块中
当使用ALTER查询对添加新的列时,不同于为所有旧数据添加这个列,对于需要在旧数据中查询新列,只会在查询时动态计算这个新列的值。但是如果新列的默认表示中依赖其他列的值进行计算,那么同样会加载这些依赖的列的数据
在嵌套数据结构中设置默认值是不允许的
如果你向表中添加一个新列,并在之后的一段时间后修改它的默认表达式,则旧数据中的值将会被改变。请注意,在运行后台合并时,缺少的列的值将被计算后写入到合并后的数据部分中
不能够为nested类型的列设置默认值。
### 临时表
在任何情况下, 如果临时表被指定, 一个临时表将被创建。 临时表有如下的特性:
ClickHouse支持临时表,其具有以下特征:
-会话结束后, 临时表将删除,或者连接丢失
- 一个临时表使用内存表引擎创建。 其他的表引擎不支持临时表
- 数据库不能为一个临时表指定。 它将创建在数据库之外
- 如果一个临时表与另外的表有相同的名称 ,一个查询指定了表名并没有指定数据库, 将使用临时表。
- 对于分布式查询处理, 查询中的临时表将被传递给远程服务器。
-回话结束时,临时表将随会话一起消失,这包含链接中断
- 临时表仅能够使用Memory表引擎
- 无法为临时表指定数据库。它是在数据库之外创建的
- 如果临时表与另一个表名称相同,那么当在查询时没有显示的指定db的情况下,将优先使用临时表。
- 对于分布式处理,查询中使用的临时表将被传递到远程服务器。
在大多数情况下, 临时表并不能手工创建, 但当查询外部数据或使用分布式全局(GLOBAL)IN时,可以创建临时表。
分布式 DDL 查询 (ON CLUSTER clause)
----------------------------------------------
`CREATE``DROP``ALTER`, 和 `RENAME` 查询支持在集群上分布式执行。 例如, 如下的查询在集群中的每个机器节点上创建了 all_hits Distributed 表:
可以使用下面的语法创建一个临时表:
```sql
CREATE TABLE IF NOT EXISTS all_hits ON CLUSTER cluster (p Date i Int32) ENGINE = Distributed(cluster default hits)
CREATE TEMPORARY TABLE [IF NOT EXISTS] table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
)
```
为了正确执行这些语句,每个节点必须有相同的集群设置(为了简化同步配置,可以使用 zookeeper 来替换)。 这些节点也可以连接到ZooKeeper 服务器。
查询语句会在每个节点上执行, 而`ALTER`查询目前暂不支持在同步表(replicated table)上执行。
大多数情况下,临时表不是手动创建的,只有在分布式查询处理中使用`(GLOBAL) IN`时为外部数据创建。更多信息,可以参考相关章节。
## 分布式DDL查询 (ON CLUSTER 子句)
## CREATE VIEW
对于 `CREATE``DROP``ALTER`,以及`RENAME`查询,系统支持其运行在整个集群上。
例如,以下查询将在`cluster`集群的所有节点上创建名为`all_hits``Distributed`表:
```sql
CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ...
``` sql
CREATE TABLE IF NOT EXISTS all_hits ON CLUSTER cluster (p Date, i Int32) ENGINE = Distributed(cluster, default, hits)
```
创建一个视图。 有两种类型的视图: 正常视图和物化(MATERIALIZED)视图。
为了能够正确的运行这种查询,每台主机必须具有相同的cluster声明(为了简化配置的同步,你可以使用zookeeper的方式进行配置)。同时这些主机还必须链接到zookeeper服务器。
这个查询将最终在集群的每台主机上运行,即使一些主机当前处于不可用状态。同时它还保证了所有的查询在单台主机中的执行顺序。
replicated系列表还没有支持`ALTER`查询。
## CREATE VIEW
当创建一个物化视图时, 你必须指定表引擎 – 此表引擎用于存储数据
``` sql
CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ...
```
一个物化视图工作流程如下所示: 当插入数据到SELECT 查询指定的表中时, 插入数据部分通过SELECT查询部分来转换, 结果插入到视图中
创建一个视图。它存在两种可选择的类型:普通视图与物化视图
正常视图不保存任何数据, 但是可以从任意表中读取数据。 换句话说,正常视图可以看作是查询结果的一个结果缓存。 当从一个视图中读取数据时, 此查询可以看做是 FROM语句的子查询
普通视图不存储任何数据,只是执行从另一个表中的读取。换句话说,普通视图只是保存了视图的查询,当从视图中查询时,此查询被作为子查询用于替换FROM子句
例如, 假设你已经创建了一个视图:
举个例子,假设你已经创建了一个视图:
```sql
``` sql
CREATE VIEW view AS SELECT ...
```
写了一个查询语句:
还有一个查询:
```sql
``` sql
SELECT a, b, c FROM view
```
此查询完全等价于子查询:
```sql
这个查询完全等价于:
``` sql
SELECT a, b, c FROM (SELECT ...)
```
物化视图保存由SELECT语句查询转换的数据。
当创建一个物化视图时,你必须指定一个引擎 – 存储数据的目标引擎。
一个物化视图使用流程如下: 当插入数据到 SELECT 指定的表时, 插入数据部分通过SELECT 来转换, 同时结果被插入到视图中。
物化视图存储的数据是由相应的SELECT查询转换得来的。
在创建物化视图时,你还必须指定表的引擎 - 将会使用这个表引擎存储数据。
如果你指定了 POPULATE, 当创建时, 现有的表数据被插入到了视图中, 类似于 `CREATE TABLE ... AS SELECT ...` . 否则, 在创建视图之后,查询仅包含表中插入的数据. 我们不建议使用 POPULATE, 在视图创建过程中,插入到表中的数据不插入到其中.
目前物化视图的工作原理:当将数据写入到物化视图中SELECT子句所指定的表时,插入的数据会通过SELECT子句查询进行转换并将最终结果插入到视图中。
一个`SELECT`查询可以包含 `DISTINCT``GROUP BY``ORDER BY``LIMIT`。。。 对应的转换在每个数据块上独立执行。 例如, 如果 GROUP BY 被设置, 数据将在插入过程中进行聚合, 但仅是在一个插入数据包中。数据不再进一步聚合。 当使用一个引擎时, 如SummingMergeTree,它将独立执行数据聚合
如果创建物化视图时指定了POPULATE子句,则在创建时将该表的数据插入到物化视图中。就像使用`CREATE TABLE ... AS SELECT ...`一样。否则,物化视图只会包含在物化视图创建后的新写入的数据。我们不推荐使用POPULATE,因为在视图创建期间写入的数据将不会写入其中
视图看起来和正常表相同。 例如, 你可以使用 SHOW TABLES来列出视图表的相关信息
当一个`SELECT`子句包含`DISTINCT`, `GROUP BY`, `ORDER BY`, `LIMIT`时,请注意,这些仅会在插入数据时在每个单独的数据块上执行。例如,如果你在其中包含了`GROUP BY`,则只会在查询期间进行聚合,但聚合范围仅限于单个批的写入数据。数据不会进一步被聚合。但是当你使用一些其他数据聚合引擎时这是例外的,如:`SummingMergeTree`
物化视图的`ALTER`查询执行还没有完全开发出来, 因此使用上可能不方便。 如果物化视图使用 `TO [db。]name`, 你能够 `DETACH` 视图, 在目标表运行 `ALTER`, 然后 `ATTACH` 之前的 `DETACH`视图
目前对物化视图执行`ALTER`是不支持的,因此这可能是不方便的。如果物化视图是使用的`TO [db.]name`的方式进行构建的,你可以使用`DETACH`语句现将视图剥离,然后使用`ALTER`运行在目标表上,然后使用`ATTACH`将之前剥离的表重新加载进来
视图看起来和正常表相同。 例如, 你可以使用 `SHOW TABLES` 来列出视图表的相关信息
视图看起来和普通的表相同。例如,你可以通过`SHOW TABLES`查看到它们
因此并没有一个单独的SQL语句来删除视图。 为了删除一个视图, 可以使用 `DROP TABLE`
没有单独的删除视图的语法。如果要删除视图,请使用`DROP TABLE`
[来源文章](https://clickhouse.yandex/docs/en/query_language/create/) <!--hide-->
......@@ -2,67 +2,70 @@
## INSERT
正在添加数据。
INSERT查询主要用于向系统中添加数据.
基本查询格式:
查询的基本格式:
```sql
``` sql
INSERT INTO [db.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...
```
此查询能够指定字段的列表来插入 `[(c1, c2, c3)]`。 在这种情况下, 剩下的字段用如下来填充:
您可以在查询中指定插入的列的列表,如:`[(c1, c2, c3)]`。对于存在于表结构中但不存在于插入列表中的列,它们将会按照如下方式填充数据:
- 从表定义中指定的 `DEFAULT` 表达式中计算出值。
- 空字符串, 如果 `DEFAULT` 表达式没有定义
- 如果存在`DEFAULT`表达式,根据`DEFAULT`表达式计算被填充的值。
- 如果没有定义`DEFAULT`表达式,则填充零或空字符串
如果 [strict_insert_defaults=1](../operations/settings/settings.md#settings-strict_insert_defaults), 没有 `DEFAULT` 定义的字段必须在查询中列出.
如果[strict_insert_defaults=1](../operations/settings/settings.md#settings-strict_insert_defaults),你必须在查询中列出所有没有定义`DEFAULT`表达式的列。
在任何ClickHouse所支持的格式上 [format](../interfaces/formats.md#formats) 数据被传入到 INSERT中. 此格式必须被显式地指定在查询中:
数据可以以ClickHouse支持的任何[输入输出格式](../interfaces/formats.md#formats)传递给INSERT。格式的名称必须显示的指定在查询中:
```sql
INSERT INTO [db.]table [(c1 c2 c3)] FORMAT format_name data_set
``` sql
INSERT INTO [db.]table [(c1, c2, c3)] FORMAT format_name data_set
```
例如, 如下的查询格式与基本的 `INSERT ... VALUES` 版本相同:
例如,下面的查询所使用的输入格式就与上面INSERT ... VALUES的中使用的输入格式相同:
```sql
``` sql
INSERT INTO [db.]table [(c1, c2, c3)] FORMAT Values (v11, v12, v13), (v21, v22, v23), ...
```
ClickHouse 在数据之前, 删除所有空格和换行(如果有)。 当形成一个查询时, 我们推荐在查询操作符之后将数据放入新行(如果数据以空格开始, 这是重要的)
ClickHouse会清除数据前所有的空白字符与一行摘要信息(如果需要的话)。所以在进行查询时,我们建议您将数据放入到输入输出格式名称后的新的一行中去(如果数据是以空白字符开始的,这将非常重要)
示例:
```sql
``` sql
INSERT INTO t FORMAT TabSeparated
11 Hello, world!
22 Qwerty
```
你能够单独从查询中插入数据,通过命令行或 HTTP 接口. 进一步信息, 参见 "[Interfaces](../interfaces/index.md#interfaces)".
在使用命令行客户端或HTTP客户端时,你可以将具体的查询语句与数据分开发送。更多具体信息,请参考“[客户端](../interfaces/index.md#interfaces)”部分。
### Inserting The Results of `SELECT`
<a name="queries-insert-select"></a>
```sql
### 使用`SELECT`的结果写入
``` sql
INSERT INTO [db.]table [(c1, c2, c3)] SELECT ...
```
在 SELECT语句中, 根据字段的位置来映射。 然而, 在SELECT表达式中的名称和表名可能不同。 如果必要, 可以进行类型转换。
写入与SELECT的列的对应关系是使用位置来进行对应的,尽管它们在SELECT表达式与INSERT中的名称可能是不同的。如果需要,会对它们执行对应的类型转换。
除了值以外没有其他数据类型允许设置值到表达式中, 例如 `now()``1 + 2`, 等。 值格式允许使用有限制的表达式, 但是它并不推荐, 因为在这种情况下, 执行了低效的代码
除了VALUES格式之外,其他格式中的数据都不允许出现诸如`now()``1 + 2`等表达式。VALUES格式允许您有限度的使用这些表达式,但是不建议您这么做,因为执行这些表达式总是低效的
不支持修改数据分区的查询如下: `UPDATE``DELETE``REPLACE``MERGE``UPSERT` `INSERT UPDATE`
然而, 你能够使用 `ALTER TABLE ... DROP PARTITION`来删除旧数据。
系统不支持的其他用于修改数据的查询:`UPDATE`, `DELETE`, `REPLACE`, `MERGE`, `UPSERT`, `INSERT UPDATE`
但是,您可以使用 `ALTER TABLE ... DROP PARTITION`查询来删除一些旧的数据。
### Performance Considerations
### 性能的注意事项
`INSERT` 通过主键来排序数据, 并通过月份来拆分数据到每个分区中。 如果插入的数据有混合的月份, 会显著降低`INSERT` 插入的性能。 应该避免此类操作:
在进行`INSERT`时将会对写入的数据进行一些处理,按照主键排序,按照月份对数据进行分区等。所以如果在您的写入数据中包含多个月份的混合数据时,将会显著的降低`INSERT`的性能。为了避免这种情况:
- 大批量地添加数据, 如每次 100,000 行。
- 在上传数据之前, 通过月份分组数据
- 数据总是以尽量大的batch进行写入,如每次写入100,000行。
- 数据在写入ClickHouse前预先的对数据进行分组
下面操作性能不会下降:
在以下的情况下,性能不会下降:
- 数据实时插入。
- 上传的数据通过时间来排序。
- 数据总是被实时的写入。
- 写入的数据已经按照时间排序。
[来源文章](https://clickhouse.yandex/docs/en/query_language/insert_into/) <!--hide-->
../../en/query_language/select.md
\ No newline at end of file
此差异已折叠。
......@@ -64,7 +64,7 @@ do
shift
elif [[ $1 == '--fast' ]]; then
# Wrong but fast pbuilder mode: create base package with all depends
EXTRAPACKAGES="$EXTRAPACKAGES debhelper cmake ninja-build gcc-7 g++-7 libc6-dev libicu-dev libreadline-dev psmisc bash expect python python-lxml python-termcolor python-requests curl perl sudo openssl netcat-openbsd uuid xml2 krb5 gsasl"
EXTRAPACKAGES="$EXTRAPACKAGES debhelper cmake ninja-build gcc-7 g++-7 libc6-dev libicu-dev libreadline-dev psmisc bash expect python python-lxml python-termcolor python-requests curl perl sudo openssl netcat-openbsd"
shift
else
echo "Unknown option $1"
......
......@@ -409,6 +409,7 @@ img {
}
#index_ul {
padding-bottom: 30px;
padding-left: 0;
margin: 0 0 30px -16px;
font-size: 90%;
......
......@@ -3,6 +3,7 @@
<head>
<meta charset="utf-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>ClickHouse — open source distributed column-oriented DBMS</title>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册