提交 ea5480fd 编写于 作者: R root

merge

Committer: maqroll <loteroc@gmail.com>
......@@ -103,3 +103,6 @@
[submodule "contrib/orc"]
path = contrib/orc
url = https://github.com/apache/orc
[submodule "contrib/sparsehash-c11"]
path = contrib/sparsehash-c11
url = https://github.com/sparsehash/sparsehash-c11.git
......@@ -372,7 +372,7 @@ It allows to set commit mode: after every batch of messages is handled, or after
* Renamed functions `leastSqr` to `simpleLinearRegression`, `LinearRegression` to `linearRegression`, `LogisticRegression` to `logisticRegression`. [#5391](https://github.com/yandex/ClickHouse/pull/5391) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
### Performance Improvements
* Paralellize processing of parts of non-replicated MergeTree tables in ALTER MODIFY query. [#4639](https://github.com/yandex/ClickHouse/pull/4639) ([Ivan Kush](https://github.com/IvanKush))
* Parallelize processing of parts of non-replicated MergeTree tables in ALTER MODIFY query. [#4639](https://github.com/yandex/ClickHouse/pull/4639) ([Ivan Kush](https://github.com/IvanKush))
* Optimizations in regular expressions extraction. [#5193](https://github.com/yandex/ClickHouse/pull/5193) [#5191](https://github.com/yandex/ClickHouse/pull/5191) ([Danila Kutenin](https://github.com/danlark1))
* Do not add right join key column to join result if it's used only in join on section. [#5260](https://github.com/yandex/ClickHouse/pull/5260) ([Artem Zuikov](https://github.com/4ertus2))
* Freeze the Kafka buffer after first empty response. It avoids multiple invokations of `ReadBuffer::next()` for empty result in some row-parsing streams. [#5283](https://github.com/yandex/ClickHouse/pull/5283) ([Ivan](https://github.com/abyss7))
......@@ -599,7 +599,7 @@ lee](https://github.com/neverlee))
* Fix error `Unknown log entry type: 0` after `OPTIMIZE TABLE FINAL` query. [#4683](https://github.com/yandex/ClickHouse/pull/4683) ([Amos Bird](https://github.com/amosbird))
* Wrong arguments to `hasAny` or `hasAll` functions may lead to segfault. [#4698](https://github.com/yandex/ClickHouse/pull/4698) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Deadlock may happen while executing `DROP DATABASE dictionary` query. [#4701](https://github.com/yandex/ClickHouse/pull/4701) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix undefinied behavior in `median` and `quantile` functions. [#4702](https://github.com/yandex/ClickHouse/pull/4702) ([hcz](https://github.com/hczhcz))
* Fix undefined behavior in `median` and `quantile` functions. [#4702](https://github.com/yandex/ClickHouse/pull/4702) ([hcz](https://github.com/hczhcz))
* Fix compression level detection when `network_compression_method` in lowercase. Broken in v19.1. [#4706](https://github.com/yandex/ClickHouse/pull/4706) ([proller](https://github.com/proller))
* Fixed ignorance of `<timezone>UTC</timezone>` setting (fixes issue [#4658](https://github.com/yandex/ClickHouse/issues/4658)). [#4718](https://github.com/yandex/ClickHouse/pull/4718) ([proller](https://github.com/proller))
* Fix `histogram` function behaviour with `Distributed` tables. [#4741](https://github.com/yandex/ClickHouse/pull/4741) ([olegkv](https://github.com/olegkv))
......@@ -668,7 +668,7 @@ lee](https://github.com/neverlee))
* Fix error `Unknown log entry type: 0` after `OPTIMIZE TABLE FINAL` query. [#4683](https://github.com/yandex/ClickHouse/pull/4683) ([Amos Bird](https://github.com/amosbird))
* Wrong arguments to `hasAny` or `hasAll` functions may lead to segfault. [#4698](https://github.com/yandex/ClickHouse/pull/4698) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Deadlock may happen while executing `DROP DATABASE dictionary` query. [#4701](https://github.com/yandex/ClickHouse/pull/4701) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix undefinied behavior in `median` and `quantile` functions. [#4702](https://github.com/yandex/ClickHouse/pull/4702) ([hcz](https://github.com/hczhcz))
* Fix undefined behavior in `median` and `quantile` functions. [#4702](https://github.com/yandex/ClickHouse/pull/4702) ([hcz](https://github.com/hczhcz))
* Fix compression level detection when `network_compression_method` in lowercase. Broken in v19.1. [#4706](https://github.com/yandex/ClickHouse/pull/4706) ([proller](https://github.com/proller))
* Fixed ignorance of `<timezone>UTC</timezone>` setting (fixes issue [#4658](https://github.com/yandex/ClickHouse/issues/4658)). [#4718](https://github.com/yandex/ClickHouse/pull/4718) ([proller](https://github.com/proller))
* Fix `histogram` function behaviour with `Distributed` tables. [#4741](https://github.com/yandex/ClickHouse/pull/4741) ([olegkv](https://github.com/olegkv))
......@@ -850,7 +850,7 @@ lee](https://github.com/neverlee))
* Added aggregate function `entropy` which computes Shannon entropy. [#4238](https://github.com/yandex/ClickHouse/pull/4238) ([Quid37](https://github.com/Quid37))
* Added ability to send queries `INSERT INTO tbl VALUES (....` to server without splitting on `query` and `data` parts. [#4301](https://github.com/yandex/ClickHouse/pull/4301) ([alesapin](https://github.com/alesapin))
* Generic implementation of `arrayWithConstant` function was added. [#4322](https://github.com/yandex/ClickHouse/pull/4322) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Implented `NOT BETWEEN` comparison operator. [#4228](https://github.com/yandex/ClickHouse/pull/4228) ([Dmitry Naumov](https://github.com/nezed))
* Implemented `NOT BETWEEN` comparison operator. [#4228](https://github.com/yandex/ClickHouse/pull/4228) ([Dmitry Naumov](https://github.com/nezed))
* Implement `sumMapFiltered` in order to be able to limit the number of keys for which values will be summed by `sumMap`. [#4129](https://github.com/yandex/ClickHouse/pull/4129) ([Léo Ercolanelli](https://github.com/ercolanelli-leo))
* Added support of `Nullable` types in `mysql` table function. [#4198](https://github.com/yandex/ClickHouse/pull/4198) ([Emmanuel Donin de Rosière](https://github.com/edonin))
* Support for arbitrary constant expressions in `LIMIT` clause. [#4246](https://github.com/yandex/ClickHouse/pull/4246) ([k3box](https://github.com/k3box))
......@@ -925,7 +925,7 @@ lee](https://github.com/neverlee))
* Added keyword `INDEX` in `CREATE TABLE` query. A column with name `index` must be quoted with backticks or double quotes: `` `index` ``. [#4143](https://github.com/yandex/ClickHouse/pull/4143) ([Nikita Vasilev](https://github.com/nikvas0))
* `sumMap` now promote result type instead of overflow. The old `sumMap` behavior can be obtained by using `sumMapWithOverflow` function. [#4151](https://github.com/yandex/ClickHouse/pull/4151) ([Léo Ercolanelli](https://github.com/ercolanelli-leo))
### Performance Impovements
### Performance Improvements
* `std::sort` replaced by `pdqsort` for queries without `LIMIT`. [#4236](https://github.com/yandex/ClickHouse/pull/4236) ([Evgenii Pravda](https://github.com/kvinty))
* Now server reuse threads from global thread pool. This affects performance in some corner cases. [#4150](https://github.com/yandex/ClickHouse/pull/4150) ([alexey-milovidov](https://github.com/alexey-milovidov))
......
......@@ -150,11 +150,32 @@ endif ()
if (LINKER_NAME)
message(STATUS "Using linker: ${LINKER_NAME} (selected from: LLD_PATH=${LLD_PATH}; GOLD_PATH=${GOLD_PATH}; COMPILER_POSTFIX=${COMPILER_POSTFIX})")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fuse-ld=${LINKER_NAME}")
set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fuse-ld=${LINKER_NAME}")
endif ()
# Make sure the final executable has symbols exported
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -rdynamic")
option (ADD_GDB_INDEX_FOR_GOLD "Set to add .gdb-index to resulting binaries for gold linker. NOOP if lld is used." 0)
if (NOT CMAKE_BUILD_TYPE_UC STREQUAL "RELEASE")
if (LINKER_NAME STREQUAL "lld")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gdb-index")
set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--gdb-index")
message (STATUS "Adding .gdb-index via --gdb-index linker option.")
# we use another tool for gdb-index, because gold linker removes section .debug_aranges, which used inside clickhouse stacktraces
# http://sourceware-org.1504.n7.nabble.com/gold-No-debug-aranges-section-when-linking-with-gdb-index-td540965.html#a556932
elseif (LINKER_NAME STREQUAL "gold" AND ADD_GDB_INDEX_FOR_GOLD)
find_program (GDB_ADD_INDEX_EXE NAMES "gdb-add-index" DOC "Path to gdb-add-index executable")
if (NOT GDB_ADD_INDEX_EXE)
set (USE_GDB_ADD_INDEX 0)
message (WARNING "Cannot add gdb index to binaries, because gold linker is used, but gdb-add-index executable not found.")
else()
set (USE_GDB_ADD_INDEX 1)
message (STATUS "gdb-add-index found: ${GDB_ADD_INDEX_EXE}")
endif()
endif ()
endif()
cmake_host_system_information(RESULT AVAILABLE_PHYSICAL_MEMORY QUERY AVAILABLE_PHYSICAL_MEMORY) # Not available under freebsd
if(NOT AVAILABLE_PHYSICAL_MEMORY OR AVAILABLE_PHYSICAL_MEMORY GREATER 8000)
option(COMPILER_PIPE "-pipe compiler option [less /tmp usage, more ram usage]" ON)
......@@ -229,7 +250,8 @@ endif ()
# Make this extra-checks for correct library dependencies.
if (NOT SANITIZE)
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--no-undefined")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--no-undefined")
set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--no-undefined")
endif ()
include(cmake/dbms_glob_sources.cmake)
......@@ -288,18 +310,6 @@ else ()
set (CLICKHOUSE_ETC_DIR "${CMAKE_INSTALL_PREFIX}/etc")
endif ()
option (UNBUNDLED "Try find all libraries in system. We recommend to avoid this mode for production builds, because we cannot guarantee exact versions and variants of libraries your system has installed. This mode exists for enthusiastic developers who search for trouble. Also it is useful for maintainers of OS packages." OFF)
if (UNBUNDLED)
set(NOT_UNBUNDLED 0)
else ()
set(NOT_UNBUNDLED 1)
endif ()
# Using system libs can cause lot of warnings in includes.
if (UNBUNDLED OR NOT (OS_LINUX OR APPLE) OR ARCH_32)
option (NO_WERROR "Disable -Werror compiler option" ON)
endif ()
message (STATUS "Building for: ${CMAKE_SYSTEM} ${CMAKE_SYSTEM_PROCESSOR} ${CMAKE_LIBRARY_ARCHITECTURE} ; USE_STATIC_LIBRARIES=${USE_STATIC_LIBRARIES} MAKE_STATIC_LIBRARIES=${MAKE_STATIC_LIBRARIES} SPLIT_SHARED=${SPLIT_SHARED_LIBRARIES} UNBUNDLED=${UNBUNDLED} CCACHE=${CCACHE_FOUND} ${CCACHE_VERSION}")
include(GNUInstallDirs)
......
......@@ -12,8 +12,7 @@
# https://youtrack.jetbrains.com/issue/CPP-2659
# https://youtrack.jetbrains.com/issue/CPP-870
string(TOLOWER "${CMAKE_COMMAND}" CMAKE_COMMAND_LOWER)
if (NOT ${CMAKE_COMMAND_LOWER} MATCHES "clion")
if (NOT DEFINED ENV{CLION_IDE})
find_program(NINJA_PATH ninja)
if (NINJA_PATH)
set(CMAKE_GENERATOR "Ninja" CACHE INTERNAL "" FORCE)
......
......@@ -13,9 +13,11 @@ ClickHouse is an open-source column-oriented database management system that all
* You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.
## Upcoming Events
* [ClickHouse Meetup in Munich](https://www.meetup.com/ClickHouse-Meetup-Munich/events/264185199/) on September 17.
* [ClickHouse Meetup in Paris](https://www.eventbrite.com/e/clickhouse-paris-meetup-2019-registration-68493270215) on October 3.
* [ClickHouse Meetup in Hong Kong](https://www.meetup.com/Hong-Kong-Machine-Learning-Meetup/events/263580542/) on October 17.
* [ClickHouse Meetup in Shenzhen](https://www.huodongxing.com/event/3483759917300) on October 20.
* [ClickHouse Meetup in Shanghai](https://www.huodongxing.com/event/4483760336000) on October 27.
* [ClickHouse Meetup in Tokyo](https://clickhouse.connpass.com/event/147001/) on November 14.
* [ClickHouse Meetup in Istanbul](https://www.eventbrite.com/e/clickhouse-meetup-istanbul-create-blazing-fast-experiences-w-clickhouse-tickets-73101120419) on November 19.
* [ClickHouse Meetup in Ankara](https://www.eventbrite.com/e/clickhouse-meetup-ankara-create-blazing-fast-experiences-w-clickhouse-tickets-73100530655) on November 21.
option (USE_INTERNAL_SPARCEHASH_LIBRARY "Set to FALSE to use system sparsehash library instead of bundled" ${NOT_UNBUNDLED})
option (USE_INTERNAL_SPARSEHASH_LIBRARY "Set to FALSE to use system sparsehash library instead of bundled" ${NOT_UNBUNDLED})
if (NOT USE_INTERNAL_SPARCEHASH_LIBRARY)
find_path (SPARCEHASH_INCLUDE_DIR NAMES sparsehash/sparse_hash_map PATHS ${SPARCEHASH_INCLUDE_PATHS})
if (NOT USE_INTERNAL_SPARSEHASH_LIBRARY)
find_path (SPARSEHASH_INCLUDE_DIR NAMES sparsehash/sparse_hash_map PATHS ${SPARSEHASH_INCLUDE_PATHS})
endif ()
if (SPARCEHASH_INCLUDE_DIR)
if (SPARSEHASH_INCLUDE_DIR)
else ()
set (USE_INTERNAL_SPARCEHASH_LIBRARY 1)
set (SPARCEHASH_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/libsparsehash")
set (USE_INTERNAL_SPARSEHASH_LIBRARY 1)
set (SPARSEHASH_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/sparsehash-c11")
endif ()
message (STATUS "Using sparsehash: ${SPARCEHASH_INCLUDE_DIR}")
message (STATUS "Using sparsehash: ${SPARSEHASH_INCLUDE_DIR}")
google-sparsehash@googlegroups.com
Copyright (c) 2005, Google Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
== 23 Ferbruary 2012 ==
A backwards incompatibility arose from flattening the include headers
structure for the <google> folder.
This is now fixed in 2.0.2. You only need to upgrade if you had previously
included files from the <google/sparsehash> folder.
== 1 February 2012 ==
A minor bug related to the namespace switch from google to sparsehash
stopped the build from working when perftools is also installed.
This is now fixed in 2.0.1. You only need to upgrade if you have perftools
installed.
== 31 January 2012 ==
I've just released sparsehash 2.0.
The `google-sparsehash` project has been renamed to `sparsehash`. I
(csilvers) am stepping down as maintainer, to be replaced by the team
of Donovan Hide and Geoff Pike. Welcome to the team, Donovan and
Geoff! Donovan has been an active contributor to sparsehash bug
reports and discussions in the past, and Geoff has been closely
involved with sparsehash inside Google (in addition to writing the
[http://code.google.com/p/cityhash CityHash hash function]). The two
of them together should be a formidable force. For good.
I bumped the major version number up to 2 to reflect the new community
ownership of the project. All the
[http://sparsehash.googlecode.com/svn/tags/sparsehash-2.0/ChangeLog changes]
are related to the renaming.
The only functional change from sparsehash 1.12 is that I've renamed
the `google/` include-directory to be `sparsehash/` instead. New code
should `#include <sparsehash/sparse_hash_map>`/etc. I've kept the old
names around as forwarding headers to the new, so `#include
<google/sparse_hash_map>` will continue to work.
Note that the classes and functions remain in the `google` C++
namespace (I didn't change that to `sparsehash` as well); I think
that's a trickier transition, and can happen in a future release.
=== 18 January 2011 ===
The `google-sparsehash` Google Code page has been renamed to
`sparsehash`, in preparation for the project being renamed to
`sparsehash`. In the coming weeks, I'll be stepping down as
maintainer for the sparsehash project, and as part of that Google is
relinquishing ownership of the project; it will now be entirely
community run. The name change reflects that shift.
=== 20 December 2011 ===
I've just released sparsehash 1.12. This release features improved
I/O (serialization) support. Support is finally added to serialize
and unserialize `dense_hash_map`/`set`, paralleling the existing code
for `sparse_hash_map`/`set`. In addition, the serialization API has
gotten simpler, with a single `serialize()` method to write to disk,
and an `unserialize()` method to read from disk. Finally, support has
gotten more generic, with built-in support for both C `FILE*`s and C++
streams, and an extension mechanism to support arbitrary sources and
sinks.
There are also more minor changes, including minor bugfixes, an
improved deleted-key test, and a minor addition to the `sparsetable`
API. See the [http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.12/ChangeLog ChangeLog]
for full details.
=== 23 June 2011 ===
I've just released sparsehash 1.11. The major user-visible change is
that the default behavior is improved -- using the hash_map/set is
faster -- for hashtables where the key is a pointer. We now notice
that case and ignore the low 2-3 bits (which are almost always 0 for
pointers) when hashing.
Another user-visible change is we've removed the tests for whether the
STL (vector, pair, etc) is defined in the 'std' namespace. gcc 2.95
is the most recent compiler I know of to put STL types and functions
in the global namespace. If you need to use such an old compiler, do
not update to the latest sparsehash release.
We've also changed the internal tools we use to integrate
Googler-supplied patches to sparsehash into the opensource release.
These new tools should result in more frequent updates with better
change descriptions. They will also result in future ChangeLog
entries being much more verbose (for better or for worse).
A full list of changes is described in
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.11/ChangeLog ChangeLog].
=== 21 January 2011 ===
I've just released sparsehash 1.10. This fixes a performance
regression in sparsehash 1.8, where sparse_hash_map would copy
hashtable keys by value even when the key was explicitly a reference.
It also fixes compiler warnings from MSVC 10, which uses some c++0x
features that did not interact well with sparsehash.
There is no reason to upgrade unless you use references for your
hashtable keys, or compile with MSVC 10. A full list of changes is
described in
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.10/ChangeLog ChangeLog].
=== 24 September 2010 ===
I've just released sparsehash 1.9. This fixes a size regression in
sparsehash 1.8, where the new allocator would take up space in
`sparse_hash_map`, doubling the sparse_hash_map overhead (from 1-2
bits per bucket to 3 or so). All users are encouraged to upgrade.
This change also marks enums as being Plain Old Data, which can speed
up hashtables with enum keys and/or values. A full list of changes is
described in
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.9/ChangeLog ChangeLog].
=== 29 July 2010 ===
I've just released sparsehash 1.8. This includes improved support for
`Allocator`, including supporting the allocator constructor arg and
`get_allocator()` access method.
To work around a bug in gcc 4.0.x, I've renamed the static variables
`HT_OCCUPANCY_FLT` and `HT_SHRINK_FLT` to `HT_OCCUPANCY_PCT` and
`HT_SHRINK_PCT`, and changed their type from float to int. This
should not be a user-visible change, since these variables are only
used in the internal hashtable classes (sparsehash clients should use
`max_load_factor()` and `min_load_factor()` instead of modifying these
static variables), but if you do access these constants, you will need
to change your code.
Internally, the biggest change is a revamp of the test suite. It now
has more complete coverage, and a more capable timing tester. There
are other, more minor changes as well. A full list of changes is
described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.8/ChangeLog ChangeLog].
=== 31 March 2010 ===
I've just released sparsehash 1.7. The major news here is the
addition of `Allocator` support. Previously, these hashtable classes
would just ignore the `Allocator` template parameter. They now
respect it, and even inherit `size_type`, `pointer`, etc. from the
allocator class. By default, they use a special allocator we provide
that uses libc `malloc` and `free` to allocate. The hash classes
notice when this special allocator is being used, and use `realloc`
when it can. This means that the default allocator is significantly
faster than custom allocators are likely to be (since realloc-like
functionality is not supported by STL allocators).
There are a few more minor changes as well. A full list of changes is
described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.7/ChangeLog ChangeLog].
=== 11 January 2010 ===
I've just released sparsehash 1.6. The API has widened a bit with the
addition of `deleted_key()` and `empty_key()`, which let you query
what values these keys have. A few rather obscure bugs have been
fixed (such as an error when copying one hashtable into another when
the empty_keys differ). A full list of changes is described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.6/ChangeLog ChangeLog].
=== 9 May 2009 ===
I've just released sparsehash 1.5.1. Hot on the heels of sparsehash
1.5, this release fixes a longstanding bug in the sparsehash code,
where `equal_range` would always return an empty range. It now works
as documented. All sparsehash users are encouraged to upgrade.
=== 7 May 2009 ===
I've just released sparsehash 1.5. This release introduces tr1
compatibility: I've added `rehash`, `begin(i)`, and other methods that
are expected to be part of the `unordered_map` API once `tr1` in
introduced. This allows `sparse_hash_map`, `dense_hash_map`,
`sparse_hash_set`, and `dense_hash_set` to be (almost) drop-in
replacements for `unordered_map` and `unordered_set`.
There is no need to upgrade unless you need this functionality, or
need one of the other, more minor, changes described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.5/ChangeLog ChangeLog].
This directory contains several hash-map implementations, similar in
API to SGI's hash_map class, but with different performance
characteristics. sparse_hash_map uses very little space overhead, 1-2
bits per entry. dense_hash_map is very fast, particularly on lookup.
(sparse_hash_set and dense_hash_set are the set versions of these
routines.) On the other hand, these classes have requirements that
may not make them appropriate for all applications.
All these implementation use a hashtable with internal quadratic
probing. This method is space-efficient -- there is no pointer
overhead -- and time-efficient for good hash functions.
COMPILING
---------
To compile test applications with these classes, run ./configure
followed by make. To install these header files on your system, run
'make install'. (On Windows, the instructions are different; see
README_windows.txt.) See INSTALL for more details.
This code should work on any modern C++ system. It has been tested on
Linux (Ubuntu, Fedora, RedHat, Debian), Solaris 10 x86, FreeBSD 6.0,
OS X 10.3 and 10.4, and Windows under both VC++7 and VC++8.
USING
-----
See the html files in the doc directory for small example programs
that use these classes. It's enough to just include the header file:
#include <sparsehash/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
google::sparse_hash_set<int, int> number_mapper;
and use the class the way you would other hash-map implementations.
(Though see "API" below for caveats.)
By default (you can change it via a flag to ./configure), these hash
implementations are defined in the google namespace.
API
---
The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
dense_hash_set, are a superset of the API of SGI's hash_map class.
See doc/sparse_hash_map.html, et al., for more information about the
API.
The usage of these classes differ from SGI's hash_map, and other
hashtable implementations, in the following major ways:
1) dense_hash_map requires you to set aside one key value as the
'empty bucket' value, set via the set_empty_key() method. This
*MUST* be called before you can use the dense_hash_map. It is
illegal to insert any elements into a dense_hash_map whose key is
equal to the empty-key.
2) For both dense_hash_map and sparse_hash_map, if you wish to delete
elements from the hashtable, you must set aside a key value as the
'deleted bucket' value, set via the set_deleted_key() method. If
your hash-map is insert-only, there is no need to call this
method. If you call set_deleted_key(), it is illegal to insert any
elements into a dense_hash_map or sparse_hash_map whose key is
equal to the deleted-key.
3) These hash-map implementation support I/O. See below.
There are also some smaller differences:
1) The constructor takes an optional argument that specifies the
number of elements you expect to insert into the hashtable. This
differs from SGI's hash_map implementation, which takes an optional
number of buckets.
2) erase() does not immediately reclaim memory. As a consequence,
erase() does not invalidate any iterators, making loops like this
correct:
for (it = ht.begin(); it != ht.end(); ++it)
if (...) ht.erase(it);
As another consequence, a series of erase() calls can leave your
hashtable using more memory than it needs to. The hashtable will
automatically compact at the next call to insert(), but to
manually compact a hashtable, you can call
ht.resize(0)
I/O
---
In addition to the normal hash-map operations, sparse_hash_map can
read and write hashtables to disk. (dense_hash_map also has the API,
but it has not yet been implemented, and writes will always fail.)
In the simplest case, writing a hashtable is as easy as calling two
methods on the hashtable:
ht.write_metadata(fp);
ht.write_nopointer_data(fp);
Reading in this data is equally simple:
google::sparse_hash_map<...> ht;
ht.read_metadata(fp);
ht.read_nopointer_data(fp);
The above is sufficient if the key and value do not contain any
pointers: they are basic C types or agglomorations of basic C types.
If the key and/or value do contain pointers, you can still store the
hashtable by replacing write_nopointer_data() with a custom writing
routine. See sparse_hash_map.html et al. for more information.
SPARSETABLE
-----------
In addition to the hash-map and hash-set classes, this package also
provides sparsetable.h, an array implementation that uses space
proportional to the number of elements in the array, rather than the
maximum element index. It uses very little space overhead: 1 bit per
entry. See doc/sparsetable.html for the API.
RESOURCE USAGE
--------------
* sparse_hash_map has memory overhead of about 2 bits per hash-map
entry.
* dense_hash_map has a factor of 2-3 memory overhead: if your
hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
total.
Hashtables tend to double in size when resizing, creating an
additional 50% space overhead. dense_hash_map does in fact have a
significant "high water mark" memory use requirement.
sparse_hash_map, however, is written to need very little space
overhead when resizing: only a few bits per hashtable entry.
PERFORMANCE
-----------
You can compile and run the included file time_hash_map.cc to examine
the performance of sparse_hash_map, dense_hash_map, and your native
hash_map implementation on your system. One test against the
SGI hash_map implementation gave the following timing information for
a simple find() call:
SGI hash_map: 22 ns
dense_hash_map: 13 ns
sparse_hash_map: 117 ns
SGI map: 113 ns
See doc/performance.html for more detailed charts on resource usage
and performance data.
---
16 March 2005
(Last updated: 12 September 2010)
// Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ----
//
// This is just a very thin wrapper over densehashtable.h, just
// like sgi stl's stl_hash_map is a very thin wrapper over
// stl_hashtable. The major thing we define is operator[], because
// we have a concept of a data_type which stl_hashtable doesn't
// (it only has a key and a value).
//
// NOTE: this is exactly like sparse_hash_map.h, with the word
// "sparse" replaced by "dense", except for the addition of
// set_empty_key().
//
// YOU MUST CALL SET_EMPTY_KEY() IMMEDIATELY AFTER CONSTRUCTION.
//
// Otherwise your program will die in mysterious ways. (Note if you
// use the constructor that takes an InputIterator range, you pass in
// the empty key in the constructor, rather than after. As a result,
// this constructor differs from the standard STL version.)
//
// In other respects, we adhere mostly to the STL semantics for
// hash-map. One important exception is that insert() may invalidate
// iterators entirely -- STL semantics are that insert() may reorder
// iterators, but they all still refer to something valid in the
// hashtable. Not so for us. Likewise, insert() may invalidate
// pointers into the hashtable. (Whether insert invalidates iterators
// and pointers depends on whether it results in a hashtable resize).
// On the plus side, delete() doesn't invalidate iterators or pointers
// at all, or even change the ordering of elements.
//
// Here are a few "power user" tips:
//
// 1) set_deleted_key():
// If you want to use erase() you *must* call set_deleted_key(),
// in addition to set_empty_key(), after construction.
// The deleted and empty keys must differ.
//
// 2) resize(0):
// When an item is deleted, its memory isn't freed right
// away. This allows you to iterate over a hashtable,
// and call erase(), without invalidating the iterator.
// To force the memory to be freed, call resize(0).
// For tr1 compatibility, this can also be called as rehash(0).
//
// 3) min_load_factor(0.0)
// Setting the minimum load factor to 0.0 guarantees that
// the hash table will never shrink.
//
// Roughly speaking:
// (1) dense_hash_map: fastest, uses the most memory unless entries are small
// (2) sparse_hash_map: slowest, uses the least memory
// (3) hash_map / unordered_map (STL): in the middle
//
// Typically I use sparse_hash_map when I care about space and/or when
// I need to save the hashtable on disk. I use hash_map otherwise. I
// don't personally use dense_hash_set ever; some people use it for
// small sets with lots of lookups.
//
// - dense_hash_map has, typically, about 78% memory overhead (if your
// data takes up X bytes, the hash_map uses .78X more bytes in overhead).
// - sparse_hash_map has about 4 bits overhead per entry.
// - sparse_hash_map can be 3-7 times slower than the others for lookup and,
// especially, inserts. See time_hash_map.cc for details.
//
// See /usr/(local/)?doc/sparsehash-*/dense_hash_map.html
// for information about how to use this class.
#ifndef _DENSE_HASH_MAP_H_
#define _DENSE_HASH_MAP_H_
#include <sparsehash/internal/sparseconfig.h>
#include <algorithm> // needed by stl_alloc
#include <functional> // for equal_to<>, select1st<>, etc
#include <memory> // for alloc
#include <utility> // for pair<>
#include <sparsehash/internal/densehashtable.h> // IWYU pragma: export
#include <sparsehash/internal/libc_allocator_with_realloc.h>
#include HASH_FUN_H // for hash<>
_START_GOOGLE_NAMESPACE_
template <class Key, class T,
class HashFcn = SPARSEHASH_HASH<Key>, // defined in sparseconfig.h
class EqualKey = std::equal_to<Key>,
class Alloc = libc_allocator_with_realloc<std::pair<const Key, T> > >
class dense_hash_map {
private:
// Apparently select1st is not stl-standard, so we define our own
struct SelectKey {
typedef const Key& result_type;
const Key& operator()(const std::pair<const Key, T>& p) const {
return p.first;
}
};
struct SetKey {
void operator()(std::pair<const Key, T>* value, const Key& new_key) const {
*const_cast<Key*>(&value->first) = new_key;
// It would be nice to clear the rest of value here as well, in
// case it's taking up a lot of memory. We do this by clearing
// the value. This assumes T has a zero-arg constructor!
value->second = T();
}
};
// For operator[].
struct DefaultValue {
std::pair<const Key, T> operator()(const Key& key) {
return std::make_pair(key, T());
}
};
// The actual data
typedef dense_hashtable<std::pair<const Key, T>, Key, HashFcn, SelectKey,
SetKey, EqualKey, Alloc> ht;
ht rep;
public:
typedef typename ht::key_type key_type;
typedef T data_type;
typedef T mapped_type;
typedef typename ht::value_type value_type;
typedef typename ht::hasher hasher;
typedef typename ht::key_equal key_equal;
typedef Alloc allocator_type;
typedef typename ht::size_type size_type;
typedef typename ht::difference_type difference_type;
typedef typename ht::pointer pointer;
typedef typename ht::const_pointer const_pointer;
typedef typename ht::reference reference;
typedef typename ht::const_reference const_reference;
typedef typename ht::iterator iterator;
typedef typename ht::const_iterator const_iterator;
typedef typename ht::local_iterator local_iterator;
typedef typename ht::const_local_iterator const_local_iterator;
// Iterator functions
iterator begin() { return rep.begin(); }
iterator end() { return rep.end(); }
const_iterator begin() const { return rep.begin(); }
const_iterator end() const { return rep.end(); }
// These come from tr1's unordered_map. For us, a bucket has 0 or 1 elements.
local_iterator begin(size_type i) { return rep.begin(i); }
local_iterator end(size_type i) { return rep.end(i); }
const_local_iterator begin(size_type i) const { return rep.begin(i); }
const_local_iterator end(size_type i) const { return rep.end(i); }
// Accessor functions
allocator_type get_allocator() const { return rep.get_allocator(); }
hasher hash_funct() const { return rep.hash_funct(); }
hasher hash_function() const { return hash_funct(); }
key_equal key_eq() const { return rep.key_eq(); }
// Constructors
explicit dense_hash_map(size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
}
template <class InputIterator>
dense_hash_map(InputIterator f, InputIterator l,
const key_type& empty_key_val,
size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
set_empty_key(empty_key_val);
rep.insert(f, l);
}
// We use the default copy constructor
// We use the default operator=()
// We use the default destructor
void clear() { rep.clear(); }
// This clears the hash map without resizing it down to the minimum
// bucket count, but rather keeps the number of buckets constant
void clear_no_resize() { rep.clear_no_resize(); }
void swap(dense_hash_map& hs) { rep.swap(hs.rep); }
// Functions concerning size
size_type size() const { return rep.size(); }
size_type max_size() const { return rep.max_size(); }
bool empty() const { return rep.empty(); }
size_type bucket_count() const { return rep.bucket_count(); }
size_type max_bucket_count() const { return rep.max_bucket_count(); }
// These are tr1 methods. bucket() is the bucket the key is or would be in.
size_type bucket_size(size_type i) const { return rep.bucket_size(i); }
size_type bucket(const key_type& key) const { return rep.bucket(key); }
float load_factor() const {
return size() * 1.0f / bucket_count();
}
float max_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return grow;
}
void max_load_factor(float new_grow) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(shrink, new_grow);
}
// These aren't tr1 methods but perhaps ought to be.
float min_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return shrink;
}
void min_load_factor(float new_shrink) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(new_shrink, grow);
}
// Deprecated; use min_load_factor() or max_load_factor() instead.
void set_resizing_parameters(float shrink, float grow) {
rep.set_resizing_parameters(shrink, grow);
}
void resize(size_type hint) { rep.resize(hint); }
void rehash(size_type hint) { resize(hint); } // the tr1 name
// Lookup routines
iterator find(const key_type& key) { return rep.find(key); }
const_iterator find(const key_type& key) const { return rep.find(key); }
data_type& operator[](const key_type& key) { // This is our value-add!
// If key is in the hashtable, returns find(key)->second,
// otherwise returns insert(value_type(key, T()).first->second.
// Note it does not create an empty T unless the find fails.
return rep.template find_or_insert<DefaultValue>(key).second;
}
size_type count(const key_type& key) const { return rep.count(key); }
std::pair<iterator, iterator> equal_range(const key_type& key) {
return rep.equal_range(key);
}
std::pair<const_iterator, const_iterator> equal_range(const key_type& key)
const {
return rep.equal_range(key);
}
// Insertion routines
std::pair<iterator, bool> insert(const value_type& obj) {
return rep.insert(obj);
}
template <class InputIterator> void insert(InputIterator f, InputIterator l) {
rep.insert(f, l);
}
void insert(const_iterator f, const_iterator l) {
rep.insert(f, l);
}
// Required for std::insert_iterator; the passed-in iterator is ignored.
iterator insert(iterator, const value_type& obj) {
return insert(obj).first;
}
// Deletion and empty routines
// THESE ARE NON-STANDARD! I make you specify an "impossible" key
// value to identify deleted and empty buckets. You can change the
// deleted key as time goes on, or get rid of it entirely to be insert-only.
void set_empty_key(const key_type& key) { // YOU MUST CALL THIS!
rep.set_empty_key(value_type(key, data_type())); // rep wants a value
}
key_type empty_key() const {
return rep.empty_key().first; // rep returns a value
}
void set_deleted_key(const key_type& key) { rep.set_deleted_key(key); }
void clear_deleted_key() { rep.clear_deleted_key(); }
key_type deleted_key() const { return rep.deleted_key(); }
// These are standard
size_type erase(const key_type& key) { return rep.erase(key); }
void erase(iterator it) { rep.erase(it); }
void erase(iterator f, iterator l) { rep.erase(f, l); }
// Comparison
bool operator==(const dense_hash_map& hs) const { return rep == hs.rep; }
bool operator!=(const dense_hash_map& hs) const { return rep != hs.rep; }
// I/O -- this is an add-on for writing hash map to disk
//
// For maximum flexibility, this does not assume a particular
// file type (though it will probably be a FILE *). We just pass
// the fp through to rep.
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
typedef typename ht::NopointerSerializer NopointerSerializer;
// serializer: a class providing operator()(OUTPUT*, const value_type&)
// (writing value_type to OUTPUT). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an ostream*/subclass_of_ostream*, OR a
// pointer to a class providing size_t Write(const void*, size_t),
// which writes a buffer into a stream (which fp presumably
// owns) and returns the number of bytes successfully written.
// Note basic_ostream<not_char> is not currently supported.
template <typename ValueSerializer, typename OUTPUT>
bool serialize(ValueSerializer serializer, OUTPUT* fp) {
return rep.serialize(serializer, fp);
}
// serializer: a functor providing operator()(INPUT*, value_type*)
// (reading from INPUT and into value_type). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an istream*/subclass_of_istream*, OR a
// pointer to a class providing size_t Read(void*, size_t),
// which reads into a buffer from a stream (which fp presumably
// owns) and returns the number of bytes successfully read.
// Note basic_istream<not_char> is not currently supported.
// NOTE: Since value_type is std::pair<const Key, T>, ValueSerializer
// may need to do a const cast in order to fill in the key.
template <typename ValueSerializer, typename INPUT>
bool unserialize(ValueSerializer serializer, INPUT* fp) {
return rep.unserialize(serializer, fp);
}
};
// We need a global swap as well
template <class Key, class T, class HashFcn, class EqualKey, class Alloc>
inline void swap(dense_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm1,
dense_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm2) {
hm1.swap(hm2);
}
_END_GOOGLE_NAMESPACE_
#endif /* _DENSE_HASH_MAP_H_ */
// Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
//
// This is just a very thin wrapper over densehashtable.h, just
// like sgi stl's stl_hash_set is a very thin wrapper over
// stl_hashtable. The major thing we define is operator[], because
// we have a concept of a data_type which stl_hashtable doesn't
// (it only has a key and a value).
//
// This is more different from dense_hash_map than you might think,
// because all iterators for sets are const (you obviously can't
// change the key, and for sets there is no value).
//
// NOTE: this is exactly like sparse_hash_set.h, with the word
// "sparse" replaced by "dense", except for the addition of
// set_empty_key().
//
// YOU MUST CALL SET_EMPTY_KEY() IMMEDIATELY AFTER CONSTRUCTION.
//
// Otherwise your program will die in mysterious ways. (Note if you
// use the constructor that takes an InputIterator range, you pass in
// the empty key in the constructor, rather than after. As a result,
// this constructor differs from the standard STL version.)
//
// In other respects, we adhere mostly to the STL semantics for
// hash-map. One important exception is that insert() may invalidate
// iterators entirely -- STL semantics are that insert() may reorder
// iterators, but they all still refer to something valid in the
// hashtable. Not so for us. Likewise, insert() may invalidate
// pointers into the hashtable. (Whether insert invalidates iterators
// and pointers depends on whether it results in a hashtable resize).
// On the plus side, delete() doesn't invalidate iterators or pointers
// at all, or even change the ordering of elements.
//
// Here are a few "power user" tips:
//
// 1) set_deleted_key():
// If you want to use erase() you must call set_deleted_key(),
// in addition to set_empty_key(), after construction.
// The deleted and empty keys must differ.
//
// 2) resize(0):
// When an item is deleted, its memory isn't freed right
// away. This allows you to iterate over a hashtable,
// and call erase(), without invalidating the iterator.
// To force the memory to be freed, call resize(0).
// For tr1 compatibility, this can also be called as rehash(0).
//
// 3) min_load_factor(0.0)
// Setting the minimum load factor to 0.0 guarantees that
// the hash table will never shrink.
//
// Roughly speaking:
// (1) dense_hash_set: fastest, uses the most memory unless entries are small
// (2) sparse_hash_set: slowest, uses the least memory
// (3) hash_set / unordered_set (STL): in the middle
//
// Typically I use sparse_hash_set when I care about space and/or when
// I need to save the hashtable on disk. I use hash_set otherwise. I
// don't personally use dense_hash_set ever; some people use it for
// small sets with lots of lookups.
//
// - dense_hash_set has, typically, about 78% memory overhead (if your
// data takes up X bytes, the hash_set uses .78X more bytes in overhead).
// - sparse_hash_set has about 4 bits overhead per entry.
// - sparse_hash_set can be 3-7 times slower than the others for lookup and,
// especially, inserts. See time_hash_map.cc for details.
//
// See /usr/(local/)?doc/sparsehash-*/dense_hash_set.html
// for information about how to use this class.
#ifndef _DENSE_HASH_SET_H_
#define _DENSE_HASH_SET_H_
#include <sparsehash/internal/sparseconfig.h>
#include <algorithm> // needed by stl_alloc
#include <functional> // for equal_to<>, select1st<>, etc
#include <memory> // for alloc
#include <utility> // for pair<>
#include <sparsehash/internal/densehashtable.h> // IWYU pragma: export
#include <sparsehash/internal/libc_allocator_with_realloc.h>
#include HASH_FUN_H // for hash<>
_START_GOOGLE_NAMESPACE_
template <class Value,
class HashFcn = SPARSEHASH_HASH<Value>, // defined in sparseconfig.h
class EqualKey = std::equal_to<Value>,
class Alloc = libc_allocator_with_realloc<Value> >
class dense_hash_set {
private:
// Apparently identity is not stl-standard, so we define our own
struct Identity {
typedef const Value& result_type;
const Value& operator()(const Value& v) const { return v; }
};
struct SetKey {
void operator()(Value* value, const Value& new_key) const {
*value = new_key;
}
};
// The actual data
typedef dense_hashtable<Value, Value, HashFcn, Identity, SetKey,
EqualKey, Alloc> ht;
ht rep;
public:
typedef typename ht::key_type key_type;
typedef typename ht::value_type value_type;
typedef typename ht::hasher hasher;
typedef typename ht::key_equal key_equal;
typedef Alloc allocator_type;
typedef typename ht::size_type size_type;
typedef typename ht::difference_type difference_type;
typedef typename ht::const_pointer pointer;
typedef typename ht::const_pointer const_pointer;
typedef typename ht::const_reference reference;
typedef typename ht::const_reference const_reference;
typedef typename ht::const_iterator iterator;
typedef typename ht::const_iterator const_iterator;
typedef typename ht::const_local_iterator local_iterator;
typedef typename ht::const_local_iterator const_local_iterator;
// Iterator functions -- recall all iterators are const
iterator begin() const { return rep.begin(); }
iterator end() const { return rep.end(); }
// These come from tr1's unordered_set. For us, a bucket has 0 or 1 elements.
local_iterator begin(size_type i) const { return rep.begin(i); }
local_iterator end(size_type i) const { return rep.end(i); }
// Accessor functions
allocator_type get_allocator() const { return rep.get_allocator(); }
hasher hash_funct() const { return rep.hash_funct(); }
hasher hash_function() const { return hash_funct(); } // tr1 name
key_equal key_eq() const { return rep.key_eq(); }
// Constructors
explicit dense_hash_set(size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, Identity(), SetKey(), alloc) {
}
template <class InputIterator>
dense_hash_set(InputIterator f, InputIterator l,
const key_type& empty_key_val,
size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, Identity(), SetKey(), alloc) {
set_empty_key(empty_key_val);
rep.insert(f, l);
}
// We use the default copy constructor
// We use the default operator=()
// We use the default destructor
void clear() { rep.clear(); }
// This clears the hash set without resizing it down to the minimum
// bucket count, but rather keeps the number of buckets constant
void clear_no_resize() { rep.clear_no_resize(); }
void swap(dense_hash_set& hs) { rep.swap(hs.rep); }
// Functions concerning size
size_type size() const { return rep.size(); }
size_type max_size() const { return rep.max_size(); }
bool empty() const { return rep.empty(); }
size_type bucket_count() const { return rep.bucket_count(); }
size_type max_bucket_count() const { return rep.max_bucket_count(); }
// These are tr1 methods. bucket() is the bucket the key is or would be in.
size_type bucket_size(size_type i) const { return rep.bucket_size(i); }
size_type bucket(const key_type& key) const { return rep.bucket(key); }
float load_factor() const {
return size() * 1.0f / bucket_count();
}
float max_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return grow;
}
void max_load_factor(float new_grow) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(shrink, new_grow);
}
// These aren't tr1 methods but perhaps ought to be.
float min_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return shrink;
}
void min_load_factor(float new_shrink) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(new_shrink, grow);
}
// Deprecated; use min_load_factor() or max_load_factor() instead.
void set_resizing_parameters(float shrink, float grow) {
rep.set_resizing_parameters(shrink, grow);
}
void resize(size_type hint) { rep.resize(hint); }
void rehash(size_type hint) { resize(hint); } // the tr1 name
// Lookup routines
iterator find(const key_type& key) const { return rep.find(key); }
size_type count(const key_type& key) const { return rep.count(key); }
std::pair<iterator, iterator> equal_range(const key_type& key) const {
return rep.equal_range(key);
}
// Insertion routines
std::pair<iterator, bool> insert(const value_type& obj) {
std::pair<typename ht::iterator, bool> p = rep.insert(obj);
return std::pair<iterator, bool>(p.first, p.second); // const to non-const
}
template <class InputIterator> void insert(InputIterator f, InputIterator l) {
rep.insert(f, l);
}
void insert(const_iterator f, const_iterator l) {
rep.insert(f, l);
}
// Required for std::insert_iterator; the passed-in iterator is ignored.
iterator insert(iterator, const value_type& obj) {
return insert(obj).first;
}
// Deletion and empty routines
// THESE ARE NON-STANDARD! I make you specify an "impossible" key
// value to identify deleted and empty buckets. You can change the
// deleted key as time goes on, or get rid of it entirely to be insert-only.
void set_empty_key(const key_type& key) { rep.set_empty_key(key); }
key_type empty_key() const { return rep.empty_key(); }
void set_deleted_key(const key_type& key) { rep.set_deleted_key(key); }
void clear_deleted_key() { rep.clear_deleted_key(); }
key_type deleted_key() const { return rep.deleted_key(); }
// These are standard
size_type erase(const key_type& key) { return rep.erase(key); }
void erase(iterator it) { rep.erase(it); }
void erase(iterator f, iterator l) { rep.erase(f, l); }
// Comparison
bool operator==(const dense_hash_set& hs) const { return rep == hs.rep; }
bool operator!=(const dense_hash_set& hs) const { return rep != hs.rep; }
// I/O -- this is an add-on for writing metainformation to disk
//
// For maximum flexibility, this does not assume a particular
// file type (though it will probably be a FILE *). We just pass
// the fp through to rep.
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
typedef typename ht::NopointerSerializer NopointerSerializer;
// serializer: a class providing operator()(OUTPUT*, const value_type&)
// (writing value_type to OUTPUT). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an ostream*/subclass_of_ostream*, OR a
// pointer to a class providing size_t Write(const void*, size_t),
// which writes a buffer into a stream (which fp presumably
// owns) and returns the number of bytes successfully written.
// Note basic_ostream<not_char> is not currently supported.
template <typename ValueSerializer, typename OUTPUT>
bool serialize(ValueSerializer serializer, OUTPUT* fp) {
return rep.serialize(serializer, fp);
}
// serializer: a functor providing operator()(INPUT*, value_type*)
// (reading from INPUT and into value_type). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an istream*/subclass_of_istream*, OR a
// pointer to a class providing size_t Read(void*, size_t),
// which reads into a buffer from a stream (which fp presumably
// owns) and returns the number of bytes successfully read.
// Note basic_istream<not_char> is not currently supported.
template <typename ValueSerializer, typename INPUT>
bool unserialize(ValueSerializer serializer, INPUT* fp) {
return rep.unserialize(serializer, fp);
}
};
template <class Val, class HashFcn, class EqualKey, class Alloc>
inline void swap(dense_hash_set<Val, HashFcn, EqualKey, Alloc>& hs1,
dense_hash_set<Val, HashFcn, EqualKey, Alloc>& hs2) {
hs1.swap(hs2);
}
_END_GOOGLE_NAMESPACE_
#endif /* _DENSE_HASH_SET_H_ */
// Copyright (c) 2010, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
//
// Provides classes shared by both sparse and dense hashtable.
//
// sh_hashtable_settings has parameters for growing and shrinking
// a hashtable. It also packages zero-size functor (ie. hasher).
//
// Other functions and classes provide common code for serializing
// and deserializing hashtables to a stream (such as a FILE*).
#ifndef UTIL_GTL_HASHTABLE_COMMON_H_
#define UTIL_GTL_HASHTABLE_COMMON_H_
#include <sparsehash/internal/sparseconfig.h>
#include <assert.h>
#include <stdio.h>
#include <stddef.h> // for size_t
#include <iosfwd>
#include <stdexcept> // For length_error
_START_GOOGLE_NAMESPACE_
template <bool> struct SparsehashCompileAssert { };
#define SPARSEHASH_COMPILE_ASSERT(expr, msg) \
static_assert(expr, #msg)
namespace sparsehash_internal {
// Adaptor methods for reading/writing data from an INPUT or OUPTUT
// variable passed to serialize() or unserialize(). For now we
// have implemented INPUT/OUTPUT for FILE*, istream*/ostream* (note
// they are pointers, unlike typical use), or else a pointer to
// something that supports a Read()/Write() method.
//
// For technical reasons, we implement read_data/write_data in two
// stages. The actual work is done in *_data_internal, which takes
// the stream argument twice: once as a template type, and once with
// normal type information. (We only use the second version.) We do
// this because of how C++ picks what function overload to use. If we
// implemented this the naive way:
// bool read_data(istream* is, const void* data, size_t length);
// template<typename T> read_data(T* fp, const void* data, size_t length);
// C++ would prefer the second version for every stream type except
// istream. However, we want C++ to prefer the first version for
// streams that are *subclasses* of istream, such as istringstream.
// This is not possible given the way template types are resolved. So
// we split the stream argument in two, one of which is templated and
// one of which is not. The specialized functions (like the istream
// version above) ignore the template arg and use the second, 'type'
// arg, getting subclass matching as normal. The 'catch-all'
// functions (the second version above) use the template arg to deduce
// the type, and use a second, void* arg to achieve the desired
// 'catch-all' semantics.
// ----- low-level I/O for FILE* ----
template<typename Ignored>
inline bool read_data_internal(Ignored*, FILE* fp,
void* data, size_t length) {
return fread(data, length, 1, fp) == 1;
}
template<typename Ignored>
inline bool write_data_internal(Ignored*, FILE* fp,
const void* data, size_t length) {
return fwrite(data, length, 1, fp) == 1;
}
// ----- low-level I/O for iostream ----
// We want the caller to be responsible for #including <iostream>, not
// us, because iostream is a big header! According to the standard,
// it's only legal to delay the instantiation the way we want to if
// the istream/ostream is a template type. So we jump through hoops.
template<typename ISTREAM>
inline bool read_data_internal_for_istream(ISTREAM* fp,
void* data, size_t length) {
return fp->read(reinterpret_cast<char*>(data), length).good();
}
template<typename Ignored>
inline bool read_data_internal(Ignored*, std::istream* fp,
void* data, size_t length) {
return read_data_internal_for_istream(fp, data, length);
}
template<typename OSTREAM>
inline bool write_data_internal_for_ostream(OSTREAM* fp,
const void* data, size_t length) {
return fp->write(reinterpret_cast<const char*>(data), length).good();
}
template<typename Ignored>
inline bool write_data_internal(Ignored*, std::ostream* fp,
const void* data, size_t length) {
return write_data_internal_for_ostream(fp, data, length);
}
// ----- low-level I/O for custom streams ----
// The INPUT type needs to support a Read() method that takes a
// buffer and a length and returns the number of bytes read.
template <typename INPUT>
inline bool read_data_internal(INPUT* fp, void*,
void* data, size_t length) {
return static_cast<size_t>(fp->Read(data, length)) == length;
}
// The OUTPUT type needs to support a Write() operation that takes
// a buffer and a length and returns the number of bytes written.
template <typename OUTPUT>
inline bool write_data_internal(OUTPUT* fp, void*,
const void* data, size_t length) {
return static_cast<size_t>(fp->Write(data, length)) == length;
}
// ----- low-level I/O: the public API ----
template <typename INPUT>
inline bool read_data(INPUT* fp, void* data, size_t length) {
return read_data_internal(fp, fp, data, length);
}
template <typename OUTPUT>
inline bool write_data(OUTPUT* fp, const void* data, size_t length) {
return write_data_internal(fp, fp, data, length);
}
// Uses read_data() and write_data() to read/write an integer.
// length is the number of bytes to read/write (which may differ
// from sizeof(IntType), allowing us to save on a 32-bit system
// and load on a 64-bit system). Excess bytes are taken to be 0.
// INPUT and OUTPUT must match legal inputs to read/write_data (above).
template <typename INPUT, typename IntType>
bool read_bigendian_number(INPUT* fp, IntType* value, size_t length) {
*value = 0;
unsigned char byte;
// We require IntType to be unsigned or else the shifting gets all screwy.
SPARSEHASH_COMPILE_ASSERT(static_cast<IntType>(-1) > static_cast<IntType>(0),
serializing_int_requires_an_unsigned_type);
for (size_t i = 0; i < length; ++i) {
if (!read_data(fp, &byte, sizeof(byte))) return false;
*value |= static_cast<IntType>(byte) << ((length - 1 - i) * 8);
}
return true;
}
template <typename OUTPUT, typename IntType>
bool write_bigendian_number(OUTPUT* fp, IntType value, size_t length) {
unsigned char byte;
// We require IntType to be unsigned or else the shifting gets all screwy.
SPARSEHASH_COMPILE_ASSERT(static_cast<IntType>(-1) > static_cast<IntType>(0),
serializing_int_requires_an_unsigned_type);
for (size_t i = 0; i < length; ++i) {
byte = (sizeof(value) <= length-1 - i)
? 0 : static_cast<unsigned char>((value >> ((length-1 - i) * 8)) & 255);
if (!write_data(fp, &byte, sizeof(byte))) return false;
}
return true;
}
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
// This is the type used for NopointerSerializer.
template <typename value_type> struct pod_serializer {
template <typename INPUT>
bool operator()(INPUT* fp, value_type* value) const {
return read_data(fp, value, sizeof(*value));
}
template <typename OUTPUT>
bool operator()(OUTPUT* fp, const value_type& value) const {
return write_data(fp, &value, sizeof(value));
}
};
// Settings contains parameters for growing and shrinking the table.
// It also packages zero-size functor (ie. hasher).
//
// It does some munging of the hash value in cases where we think
// (fear) the original hash function might not be very good. In
// particular, the default hash of pointers is the identity hash,
// so probably all the low bits are 0. We identify when we think
// we're hashing a pointer, and chop off the low bits. Note this
// isn't perfect: even when the key is a pointer, we can't tell
// for sure that the hash is the identity hash. If it's not, this
// is needless work (and possibly, though not likely, harmful).
template<typename Key, typename HashFunc,
typename SizeType, int HT_MIN_BUCKETS>
class sh_hashtable_settings : public HashFunc {
public:
typedef Key key_type;
typedef HashFunc hasher;
typedef SizeType size_type;
public:
sh_hashtable_settings(const hasher& hf,
const float ht_occupancy_flt,
const float ht_empty_flt)
: hasher(hf),
enlarge_threshold_(0),
shrink_threshold_(0),
consider_shrink_(false),
use_empty_(false),
use_deleted_(false),
num_ht_copies_(0) {
set_enlarge_factor(ht_occupancy_flt);
set_shrink_factor(ht_empty_flt);
}
size_type hash(const key_type& v) const {
// We munge the hash value when we don't trust hasher::operator().
return hash_munger<Key>::MungedHash(hasher::operator()(v));
}
float enlarge_factor() const {
return enlarge_factor_;
}
void set_enlarge_factor(float f) {
enlarge_factor_ = f;
}
float shrink_factor() const {
return shrink_factor_;
}
void set_shrink_factor(float f) {
shrink_factor_ = f;
}
size_type enlarge_threshold() const {
return enlarge_threshold_;
}
void set_enlarge_threshold(size_type t) {
enlarge_threshold_ = t;
}
size_type shrink_threshold() const {
return shrink_threshold_;
}
void set_shrink_threshold(size_type t) {
shrink_threshold_ = t;
}
size_type enlarge_size(size_type x) const {
return static_cast<size_type>(x * enlarge_factor_);
}
size_type shrink_size(size_type x) const {
return static_cast<size_type>(x * shrink_factor_);
}
bool consider_shrink() const {
return consider_shrink_;
}
void set_consider_shrink(bool t) {
consider_shrink_ = t;
}
bool use_empty() const {
return use_empty_;
}
void set_use_empty(bool t) {
use_empty_ = t;
}
bool use_deleted() const {
return use_deleted_;
}
void set_use_deleted(bool t) {
use_deleted_ = t;
}
size_type num_ht_copies() const {
return static_cast<size_type>(num_ht_copies_);
}
void inc_num_ht_copies() {
++num_ht_copies_;
}
// Reset the enlarge and shrink thresholds
void reset_thresholds(size_type num_buckets) {
set_enlarge_threshold(enlarge_size(num_buckets));
set_shrink_threshold(shrink_size(num_buckets));
// whatever caused us to reset already considered
set_consider_shrink(false);
}
// Caller is resposible for calling reset_threshold right after
// set_resizing_parameters.
void set_resizing_parameters(float shrink, float grow) {
assert(shrink >= 0.0);
assert(grow <= 1.0);
if (shrink > grow/2.0f)
shrink = grow / 2.0f; // otherwise we thrash hashtable size
set_shrink_factor(shrink);
set_enlarge_factor(grow);
}
// This is the smallest size a hashtable can be without being too crowded
// If you like, you can give a min #buckets as well as a min #elts
size_type min_buckets(size_type num_elts, size_type min_buckets_wanted) {
float enlarge = enlarge_factor();
size_type sz = HT_MIN_BUCKETS; // min buckets allowed
while ( sz < min_buckets_wanted ||
num_elts >= static_cast<size_type>(sz * enlarge) ) {
// This just prevents overflowing size_type, since sz can exceed
// max_size() here.
if (static_cast<size_type>(sz * 2) < sz) {
throw std::length_error("resize overflow"); // protect against overflow
}
sz *= 2;
}
return sz;
}
private:
template<class HashKey> class hash_munger {
public:
static size_t MungedHash(size_t hash) {
return hash;
}
};
// This matches when the hashtable key is a pointer.
template<class HashKey> class hash_munger<HashKey*> {
public:
static size_t MungedHash(size_t hash) {
// TODO(csilvers): consider rotating instead:
// static const int shift = (sizeof(void *) == 4) ? 2 : 3;
// return (hash << (sizeof(hash) * 8) - shift)) | (hash >> shift);
// This matters if we ever change sparse/dense_hash_* to compare
// hashes before comparing actual values. It's speedy on x86.
return hash / sizeof(void*); // get rid of known-0 bits
}
};
size_type enlarge_threshold_; // table.size() * enlarge_factor
size_type shrink_threshold_; // table.size() * shrink_factor
float enlarge_factor_; // how full before resize
float shrink_factor_; // how empty before resize
// consider_shrink=true if we should try to shrink before next insert
bool consider_shrink_;
bool use_empty_; // used only by densehashtable, not sparsehashtable
bool use_deleted_; // false until delkey has been set
// num_ht_copies is a counter incremented every Copy/Move
unsigned int num_ht_copies_;
};
} // namespace sparsehash_internal
#undef SPARSEHASH_COMPILE_ASSERT
_END_GOOGLE_NAMESPACE_
#endif // UTIL_GTL_HASHTABLE_COMMON_H_
// Copyright (c) 2010, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
#ifndef UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_
#define UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_
#include <sparsehash/internal/sparseconfig.h>
#include <stdlib.h> // for malloc/realloc/free
#include <stddef.h> // for ptrdiff_t
#include <new> // for placement new
_START_GOOGLE_NAMESPACE_
template<class T>
class libc_allocator_with_realloc {
public:
typedef T value_type;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
libc_allocator_with_realloc() {}
libc_allocator_with_realloc(const libc_allocator_with_realloc&) {}
~libc_allocator_with_realloc() {}
pointer address(reference r) const { return &r; }
const_pointer address(const_reference r) const { return &r; }
pointer allocate(size_type n, const_pointer = 0) {
return static_cast<pointer>(malloc(n * sizeof(value_type)));
}
void deallocate(pointer p, size_type) {
free(p);
}
pointer reallocate(pointer p, size_type n) {
return static_cast<pointer>(realloc(p, n * sizeof(value_type)));
}
size_type max_size() const {
return static_cast<size_type>(-1) / sizeof(value_type);
}
void construct(pointer p, const value_type& val) {
new(p) value_type(val);
}
void destroy(pointer p) { p->~value_type(); }
template <class U>
libc_allocator_with_realloc(const libc_allocator_with_realloc<U>&) {}
template<class U>
struct rebind {
typedef libc_allocator_with_realloc<U> other;
};
};
// libc_allocator_with_realloc<void> specialization.
template<>
class libc_allocator_with_realloc<void> {
public:
typedef void value_type;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef void* pointer;
typedef const void* const_pointer;
template<class U>
struct rebind {
typedef libc_allocator_with_realloc<U> other;
};
};
template<class T>
inline bool operator==(const libc_allocator_with_realloc<T>&,
const libc_allocator_with_realloc<T>&) {
return true;
}
template<class T>
inline bool operator!=(const libc_allocator_with_realloc<T>&,
const libc_allocator_with_realloc<T>&) {
return false;
}
_END_GOOGLE_NAMESPACE_
#endif // UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_
/*
* NOTE: This file is for internal use only.
* Do not use these #defines in your own program!
*/
/* Namespace for Google classes */
#define GOOGLE_NAMESPACE ::google
/* the location of the header defining hash functions */
#define HASH_FUN_H <functional>
/* the namespace of the hash<> function */
#define HASH_NAMESPACE std
/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1
/* Define to 1 if the system has the type `long long'. */
#define HAVE_LONG_LONG 1
/* Define to 1 if you have the `memcpy' function. */
#define HAVE_MEMCPY 1
/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1
/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1
/* Define to 1 if the system has the type `uint16_t'. */
#define HAVE_UINT16_T 1
/* Define to 1 if the system has the type `u_int16_t'. */
#define HAVE_U_INT16_T 1
/* Define to 1 if the system has the type `__uint16'. */
/* #undef HAVE___UINT16 */
/* The system-provided hash function including the namespace. */
#define SPARSEHASH_HASH HASH_NAMESPACE::hash
/* Stops putting the code inside the Google namespace */
#define _END_GOOGLE_NAMESPACE_ }
/* Puts following code inside the Google namespace */
#define _START_GOOGLE_NAMESPACE_ namespace google {
// Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
//
// This is just a very thin wrapper over sparsehashtable.h, just
// like sgi stl's stl_hash_map is a very thin wrapper over
// stl_hashtable. The major thing we define is operator[], because
// we have a concept of a data_type which stl_hashtable doesn't
// (it only has a key and a value).
//
// We adhere mostly to the STL semantics for hash-map. One important
// exception is that insert() may invalidate iterators entirely -- STL
// semantics are that insert() may reorder iterators, but they all
// still refer to something valid in the hashtable. Not so for us.
// Likewise, insert() may invalidate pointers into the hashtable.
// (Whether insert invalidates iterators and pointers depends on
// whether it results in a hashtable resize). On the plus side,
// delete() doesn't invalidate iterators or pointers at all, or even
// change the ordering of elements.
//
// Here are a few "power user" tips:
//
// 1) set_deleted_key():
// Unlike STL's hash_map, if you want to use erase() you
// *must* call set_deleted_key() after construction.
//
// 2) resize(0):
// When an item is deleted, its memory isn't freed right
// away. This is what allows you to iterate over a hashtable
// and call erase() without invalidating the iterator.
// To force the memory to be freed, call resize(0).
// For tr1 compatibility, this can also be called as rehash(0).
//
// 3) min_load_factor(0.0)
// Setting the minimum load factor to 0.0 guarantees that
// the hash table will never shrink.
//
// Roughly speaking:
// (1) dense_hash_map: fastest, uses the most memory unless entries are small
// (2) sparse_hash_map: slowest, uses the least memory
// (3) hash_map / unordered_map (STL): in the middle
//
// Typically I use sparse_hash_map when I care about space and/or when
// I need to save the hashtable on disk. I use hash_map otherwise. I
// don't personally use dense_hash_map ever; some people use it for
// small maps with lots of lookups.
//
// - dense_hash_map has, typically, about 78% memory overhead (if your
// data takes up X bytes, the hash_map uses .78X more bytes in overhead).
// - sparse_hash_map has about 4 bits overhead per entry.
// - sparse_hash_map can be 3-7 times slower than the others for lookup and,
// especially, inserts. See time_hash_map.cc for details.
//
// See /usr/(local/)?doc/sparsehash-*/sparse_hash_map.html
// for information about how to use this class.
#ifndef _SPARSE_HASH_MAP_H_
#define _SPARSE_HASH_MAP_H_
#include <sparsehash/internal/sparseconfig.h>
#include <algorithm> // needed by stl_alloc
#include <functional> // for equal_to<>, select1st<>, etc
#include <memory> // for alloc
#include <utility> // for pair<>
#include <sparsehash/internal/libc_allocator_with_realloc.h>
#include <sparsehash/internal/sparsehashtable.h> // IWYU pragma: export
#include HASH_FUN_H // for hash<>
_START_GOOGLE_NAMESPACE_
template <class Key, class T,
class HashFcn = SPARSEHASH_HASH<Key>, // defined in sparseconfig.h
class EqualKey = std::equal_to<Key>,
class Alloc = libc_allocator_with_realloc<std::pair<const Key, T> > >
class sparse_hash_map {
private:
// Apparently select1st is not stl-standard, so we define our own
struct SelectKey {
typedef const Key& result_type;
const Key& operator()(const std::pair<const Key, T>& p) const {
return p.first;
}
};
struct SetKey {
void operator()(std::pair<const Key, T>* value, const Key& new_key) const {
*const_cast<Key*>(&value->first) = new_key;
// It would be nice to clear the rest of value here as well, in
// case it's taking up a lot of memory. We do this by clearing
// the value. This assumes T has a zero-arg constructor!
value->second = T();
}
};
// For operator[].
struct DefaultValue {
std::pair<const Key, T> operator()(const Key& key) {
return std::make_pair(key, T());
}
};
// The actual data
typedef sparse_hashtable<std::pair<const Key, T>, Key, HashFcn, SelectKey,
SetKey, EqualKey, Alloc> ht;
ht rep;
public:
typedef typename ht::key_type key_type;
typedef T data_type;
typedef T mapped_type;
typedef typename ht::value_type value_type;
typedef typename ht::hasher hasher;
typedef typename ht::key_equal key_equal;
typedef Alloc allocator_type;
typedef typename ht::size_type size_type;
typedef typename ht::difference_type difference_type;
typedef typename ht::pointer pointer;
typedef typename ht::const_pointer const_pointer;
typedef typename ht::reference reference;
typedef typename ht::const_reference const_reference;
typedef typename ht::iterator iterator;
typedef typename ht::const_iterator const_iterator;
typedef typename ht::local_iterator local_iterator;
typedef typename ht::const_local_iterator const_local_iterator;
// Iterator functions
iterator begin() { return rep.begin(); }
iterator end() { return rep.end(); }
const_iterator begin() const { return rep.begin(); }
const_iterator end() const { return rep.end(); }
// These come from tr1's unordered_map. For us, a bucket has 0 or 1 elements.
local_iterator begin(size_type i) { return rep.begin(i); }
local_iterator end(size_type i) { return rep.end(i); }
const_local_iterator begin(size_type i) const { return rep.begin(i); }
const_local_iterator end(size_type i) const { return rep.end(i); }
// Accessor functions
allocator_type get_allocator() const { return rep.get_allocator(); }
hasher hash_funct() const { return rep.hash_funct(); }
hasher hash_function() const { return hash_funct(); }
key_equal key_eq() const { return rep.key_eq(); }
// Constructors
explicit sparse_hash_map(size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
}
template <class InputIterator>
sparse_hash_map(InputIterator f, InputIterator l,
size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
rep.insert(f, l);
}
// We use the default copy constructor
// We use the default operator=()
// We use the default destructor
void clear() { rep.clear(); }
void swap(sparse_hash_map& hs) { rep.swap(hs.rep); }
// Functions concerning size
size_type size() const { return rep.size(); }
size_type max_size() const { return rep.max_size(); }
bool empty() const { return rep.empty(); }
size_type bucket_count() const { return rep.bucket_count(); }
size_type max_bucket_count() const { return rep.max_bucket_count(); }
// These are tr1 methods. bucket() is the bucket the key is or would be in.
size_type bucket_size(size_type i) const { return rep.bucket_size(i); }
size_type bucket(const key_type& key) const { return rep.bucket(key); }
float load_factor() const {
return size() * 1.0f / bucket_count();
}
float max_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return grow;
}
void max_load_factor(float new_grow) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(shrink, new_grow);
}
// These aren't tr1 methods but perhaps ought to be.
float min_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return shrink;
}
void min_load_factor(float new_shrink) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(new_shrink, grow);
}
// Deprecated; use min_load_factor() or max_load_factor() instead.
void set_resizing_parameters(float shrink, float grow) {
rep.set_resizing_parameters(shrink, grow);
}
void resize(size_type hint) { rep.resize(hint); }
void rehash(size_type hint) { resize(hint); } // the tr1 name
// Lookup routines
iterator find(const key_type& key) { return rep.find(key); }
const_iterator find(const key_type& key) const { return rep.find(key); }
data_type& operator[](const key_type& key) { // This is our value-add!
// If key is in the hashtable, returns find(key)->second,
// otherwise returns insert(value_type(key, T()).first->second.
// Note it does not create an empty T unless the find fails.
return rep.template find_or_insert<DefaultValue>(key).second;
}
size_type count(const key_type& key) const { return rep.count(key); }
std::pair<iterator, iterator> equal_range(const key_type& key) {
return rep.equal_range(key);
}
std::pair<const_iterator, const_iterator> equal_range(const key_type& key)
const {
return rep.equal_range(key);
}
// Insertion routines
std::pair<iterator, bool> insert(const value_type& obj) {
return rep.insert(obj);
}
template <class InputIterator> void insert(InputIterator f, InputIterator l) {
rep.insert(f, l);
}
void insert(const_iterator f, const_iterator l) {
rep.insert(f, l);
}
// Required for std::insert_iterator; the passed-in iterator is ignored.
iterator insert(iterator, const value_type& obj) {
return insert(obj).first;
}
// Deletion routines
// THESE ARE NON-STANDARD! I make you specify an "impossible" key
// value to identify deleted buckets. You can change the key as
// time goes on, or get rid of it entirely to be insert-only.
void set_deleted_key(const key_type& key) {
rep.set_deleted_key(key);
}
void clear_deleted_key() { rep.clear_deleted_key(); }
key_type deleted_key() const { return rep.deleted_key(); }
// These are standard
size_type erase(const key_type& key) { return rep.erase(key); }
void erase(iterator it) { rep.erase(it); }
void erase(iterator f, iterator l) { rep.erase(f, l); }
// Comparison
bool operator==(const sparse_hash_map& hs) const { return rep == hs.rep; }
bool operator!=(const sparse_hash_map& hs) const { return rep != hs.rep; }
// I/O -- this is an add-on for writing metainformation to disk
//
// For maximum flexibility, this does not assume a particular
// file type (though it will probably be a FILE *). We just pass
// the fp through to rep.
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
typedef typename ht::NopointerSerializer NopointerSerializer;
// serializer: a class providing operator()(OUTPUT*, const value_type&)
// (writing value_type to OUTPUT). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an ostream*/subclass_of_ostream*, OR a
// pointer to a class providing size_t Write(const void*, size_t),
// which writes a buffer into a stream (which fp presumably
// owns) and returns the number of bytes successfully written.
// Note basic_ostream<not_char> is not currently supported.
template <typename ValueSerializer, typename OUTPUT>
bool serialize(ValueSerializer serializer, OUTPUT* fp) {
return rep.serialize(serializer, fp);
}
// serializer: a functor providing operator()(INPUT*, value_type*)
// (reading from INPUT and into value_type). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an istream*/subclass_of_istream*, OR a
// pointer to a class providing size_t Read(void*, size_t),
// which reads into a buffer from a stream (which fp presumably
// owns) and returns the number of bytes successfully read.
// Note basic_istream<not_char> is not currently supported.
// NOTE: Since value_type is std::pair<const Key, T>, ValueSerializer
// may need to do a const cast in order to fill in the key.
// NOTE: if Key or T are not POD types, the serializer MUST use
// placement-new to initialize their values, rather than a normal
// equals-assignment or similar. (The value_type* passed into the
// serializer points to garbage memory.)
template <typename ValueSerializer, typename INPUT>
bool unserialize(ValueSerializer serializer, INPUT* fp) {
return rep.unserialize(serializer, fp);
}
// The four methods below are DEPRECATED.
// Use serialize() and unserialize() for new code.
template <typename OUTPUT>
bool write_metadata(OUTPUT *fp) { return rep.write_metadata(fp); }
template <typename INPUT>
bool read_metadata(INPUT *fp) { return rep.read_metadata(fp); }
template <typename OUTPUT>
bool write_nopointer_data(OUTPUT *fp) { return rep.write_nopointer_data(fp); }
template <typename INPUT>
bool read_nopointer_data(INPUT *fp) { return rep.read_nopointer_data(fp); }
};
// We need a global swap as well
template <class Key, class T, class HashFcn, class EqualKey, class Alloc>
inline void swap(sparse_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm1,
sparse_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm2) {
hm1.swap(hm2);
}
_END_GOOGLE_NAMESPACE_
#endif /* _SPARSE_HASH_MAP_H_ */
此差异已折叠。
// Copyright 2005 Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ----
//
// Template metaprogramming utility functions.
//
// This code is compiled directly on many platforms, including client
// platforms like Windows, Mac, and embedded systems. Before making
// any changes here, make sure that you're not breaking any platforms.
//
//
// The names choosen here reflect those used in tr1 and the boost::mpl
// library, there are similar operations used in the Loki library as
// well. I prefer the boost names for 2 reasons:
// 1. I think that portions of the Boost libraries are more likely to
// be included in the c++ standard.
// 2. It is not impossible that some of the boost libraries will be
// included in our own build in the future.
// Both of these outcomes means that we may be able to directly replace
// some of these with boost equivalents.
//
#ifndef BASE_TEMPLATE_UTIL_H_
#define BASE_TEMPLATE_UTIL_H_
#include <sparsehash/internal/sparseconfig.h>
_START_GOOGLE_NAMESPACE_
// Types small_ and big_ are guaranteed such that sizeof(small_) <
// sizeof(big_)
typedef char small_;
struct big_ {
char dummy[2];
};
// Identity metafunction.
template <class T>
struct identity_ {
typedef T type;
};
// integral_constant, defined in tr1, is a wrapper for an integer
// value. We don't really need this generality; we could get away
// with hardcoding the integer type to bool. We use the fully
// general integer_constant for compatibility with tr1.
template<class T, T v>
struct integral_constant {
static const T value = v;
typedef T value_type;
typedef integral_constant<T, v> type;
};
template <class T, T v> const T integral_constant<T, v>::value;
// Abbreviations: true_type and false_type are structs that represent boolean
// true and false values. Also define the boost::mpl versions of those names,
// true_ and false_.
typedef integral_constant<bool, true> true_type;
typedef integral_constant<bool, false> false_type;
typedef true_type true_;
typedef false_type false_;
// if_ is a templatized conditional statement.
// if_<cond, A, B> is a compile time evaluation of cond.
// if_<>::type contains A if cond is true, B otherwise.
template<bool cond, typename A, typename B>
struct if_{
typedef A type;
};
template<typename A, typename B>
struct if_<false, A, B> {
typedef B type;
};
// type_equals_ is a template type comparator, similar to Loki IsSameType.
// type_equals_<A, B>::value is true iff "A" is the same type as "B".
//
// New code should prefer base::is_same, defined in base/type_traits.h.
// It is functionally identical, but is_same is the standard spelling.
template<typename A, typename B>
struct type_equals_ : public false_ {
};
template<typename A>
struct type_equals_<A, A> : public true_ {
};
// and_ is a template && operator.
// and_<A, B>::value evaluates "A::value && B::value".
template<typename A, typename B>
struct and_ : public integral_constant<bool, (A::value && B::value)> {
};
// or_ is a template || operator.
// or_<A, B>::value evaluates "A::value || B::value".
template<typename A, typename B>
struct or_ : public integral_constant<bool, (A::value || B::value)> {
};
_END_GOOGLE_NAMESPACE_
#endif // BASE_TEMPLATE_UTIL_H_
Subproject commit cf0bffaa456f23bc4174462a789b90f8b6f5f42f
......@@ -151,6 +151,7 @@ macro(add_object_library name common_path)
add_glob(${name}_headers RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} ${common_path}/*.h)
add_glob(${name}_sources ${common_path}/*.cpp ${common_path}/*.c ${common_path}/*.h)
add_library(${name} SHARED ${${name}_sources} ${${name}_headers})
target_link_libraries (${name} PRIVATE -Wl,--unresolved-symbols=ignore-all)
endif ()
endmacro()
......@@ -175,6 +176,7 @@ add_object_library(clickhouse_processors_formats_impl src/Processors/Formats/Imp
add_object_library(clickhouse_processors_transforms src/Processors/Transforms)
add_object_library(clickhouse_processors_sources src/Processors/Sources)
if (MAKE_STATIC_LIBRARIES OR NOT SPLIT_SHARED_LIBRARIES)
add_library (dbms STATIC ${dbms_headers} ${dbms_sources})
set (all_modules dbms)
......@@ -393,7 +395,7 @@ if (OPENSSL_CRYPTO_LIBRARY)
endif ()
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${DIVIDE_INCLUDE_DIR})
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
if (USE_PROTOBUF)
dbms_target_link_libraries (PRIVATE ${Protobuf_LIBRARY})
......
......@@ -205,6 +205,9 @@ else ()
add_custom_target (clickhouse-bundle ALL DEPENDS ${CLICKHOUSE_BUNDLE})
if (USE_GDB_ADD_INDEX)
add_custom_command(TARGET clickhouse POST_BUILD COMMAND ${GDB_ADD_INDEX_EXE} clickhouse COMMENT "Adding .gdb-index to clickhouse" VERBATIM)
endif()
endif ()
if (TARGET clickhouse-server AND TARGET copy-headers)
......
......@@ -4,7 +4,11 @@ set(CLICKHOUSE_CLIENT_SOURCES
)
set(CLICKHOUSE_CLIENT_LINK PRIVATE clickhouse_common_config clickhouse_functions clickhouse_aggregate_functions clickhouse_common_io clickhouse_parsers string_utils ${LINE_EDITING_LIBS} ${Boost_PROGRAM_OPTIONS_LIBRARY})
set(CLICKHOUSE_CLIENT_INCLUDE SYSTEM PRIVATE ${READLINE_INCLUDE_DIR} PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/include)
set(CLICKHOUSE_CLIENT_INCLUDE PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/include)
if (READLINE_INCLUDE_DIR)
set(CLICKHOUSE_CLIENT_INCLUDE ${CLICKHOUSE_CLIENT_INCLUDE} SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
endif ()
include(CheckSymbolExists)
check_symbol_exists(readpassphrase readpassphrase.h HAVE_READPASSPHRASE)
......
set(CLICKHOUSE_COPIER_SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/ClusterCopier.cpp)
set(CLICKHOUSE_COPIER_LINK PRIVATE clickhouse_common_zookeeper clickhouse_parsers clickhouse_functions clickhouse_table_functions clickhouse_aggregate_functions clickhouse_dictionaries string_utils PUBLIC daemon)
set(CLICKHOUSE_COPIER_LINK PRIVATE clickhouse_common_zookeeper clickhouse_parsers clickhouse_functions clickhouse_table_functions clickhouse_aggregate_functions clickhouse_dictionaries string_utils ${Poco_XML_LIBRARY} PUBLIC daemon)
set(CLICKHOUSE_COPIER_INCLUDE SYSTEM PRIVATE ${PCG_RANDOM_INCLUDE_DIR})
clickhouse_program_add(copier)
......@@ -42,6 +42,10 @@ set_target_properties(clickhouse-odbc-bridge PROPERTIES RUNTIME_OUTPUT_DIRECTORY
clickhouse_program_link_split_binary(odbc-bridge)
if (USE_GDB_ADD_INDEX)
add_custom_command(TARGET clickhouse-odbc-bridge POST_BUILD COMMAND ${GDB_ADD_INDEX_EXE} ../clickhouse-odbc-bridge COMMENT "Adding .gdb-index to clickhouse-odbc-bridge" VERBATIM)
endif()
install(TARGETS clickhouse-odbc-bridge RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
if(ENABLE_TESTS)
......
......@@ -31,9 +31,10 @@ void StopConditionsSet::loadFromConfig(const ConfigurationPtr & stop_conditions_
else if (key == "average_speed_not_changing_for_ms")
average_speed_not_changing_for_ms.value = stop_conditions_view->getUInt64(key);
else
throw Exception("Met unkown stop condition: " + key, ErrorCodes::LOGICAL_ERROR);
throw Exception("Met unknown stop condition: " + key, ErrorCodes::LOGICAL_ERROR);
++initialized_count;
}
++initialized_count;
}
void StopConditionsSet::reset()
......
#include <Common/config.h>
#if USE_POCO_NETSSL
#include "MySQLHandler.h"
#include <limits>
......@@ -301,3 +303,4 @@ void MySQLHandler::comQuery(ReadBuffer & payload)
}
}
#endif
#pragma once
#include <Common/config.h>
#if USE_POCO_NETSSL
#include <Poco/Net/TCPServerConnection.h>
#include <Poco/Net/SecureStreamSocket.h>
......@@ -56,3 +58,4 @@ private:
};
}
#endif
#include <Common/config.h>
#if USE_POCO_NETSSL
#include <Common/OpenSSLHelpers.h>
#include <Poco/Crypto/X509Certificate.h>
#include <Poco/Net/SSLManager.h>
......@@ -122,3 +124,4 @@ Poco::Net::TCPServerConnection * MySQLHandlerFactory::createConnection(const Poc
}
}
#endif
#pragma once
#include <Common/config.h>
#if USE_POCO_NETSSL
#include <Poco/Net/TCPServerConnectionFactory.h>
#include <atomic>
#include <openssl/rsa.h>
......@@ -37,3 +40,4 @@ public:
};
}
#endif
......@@ -88,6 +88,7 @@ namespace ErrorCodes
extern const int FAILED_TO_GETPWUID;
extern const int MISMATCHING_USERS_FOR_PROCESS_AND_DATA;
extern const int NETWORK_ERROR;
extern const int PATH_ACCESS_DENIED;
}
......@@ -270,6 +271,15 @@ int Server::main(const std::vector<std::string> & /*args*/)
Poco::File(path + "data/" + default_database).createDirectories();
Poco::File(path + "metadata/" + default_database).createDirectories();
/// Check that we have read and write access to all data paths
auto disk_selector = global_context->getDiskSelector();
for (const auto & [name, disk] : disk_selector.getDisksMap())
{
Poco::File disk_path(disk->getPath());
if (!disk_path.canRead() || !disk_path.canWrite())
throw Exception("There is no RW access to disk " + name + " (" + disk->getPath() + ")", ErrorCodes::PATH_ACCESS_DENIED);
}
StatusFile status{path + "status"};
SCOPE_EXIT({
......
......@@ -545,6 +545,9 @@ void TCPHandler::processOrdinaryQueryWithProcessors(size_t num_threads)
{
auto & pipeline = state.io.pipeline;
if (pipeline.getMaxThreads())
num_threads = pipeline.getMaxThreads();
/// Send header-block, to allow client to prepare output format for data to send.
{
auto & header = pipeline.getHeader();
......@@ -594,7 +597,15 @@ void TCPHandler::processOrdinaryQueryWithProcessors(size_t num_threads)
lazy_format->finish();
lazy_format->clearQueue();
pool.wait();
try
{
pool.wait();
}
catch (...)
{
/// If exception was thrown during pipeline execution, skip it while processing other exception.
}
pipeline = QueryPipeline()
);
......@@ -1000,8 +1011,7 @@ void TCPHandler::receiveUnexpectedData()
auto skip_block_in = std::make_shared<NativeBlockInputStream>(
*maybe_compressed_in,
last_block_in.header,
client_revision,
!connection_context.getSettingsRef().low_cardinality_allow_in_native_format);
client_revision);
Block skip_block = skip_block_in->read();
throw NetException("Unexpected packet Data received from client", ErrorCodes::UNEXPECTED_PACKET_FROM_CLIENT);
......@@ -1028,8 +1038,7 @@ void TCPHandler::initBlockInput()
state.block_in = std::make_shared<NativeBlockInputStream>(
*state.maybe_compressed_in,
header,
client_revision,
!connection_context.getSettingsRef().low_cardinality_allow_in_native_format);
client_revision);
}
}
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册