- 23 10月, 2021 1 次提交
-
-
由 Jonathan Albrecht 提交于
Summary: This PR adds support for building on s390x including updating travis CI. It uses the previous work in https://github.com/facebook/rocksdb/pull/6168 and adds some more changes to get all current tests (make check and jni tests) to pass. The tests were run with snappy, lz4, bzip2 and zstd all compiled in. There are a few pieces still needed to get the travis build working that I don't think I can do. adamretter is this something you could help with? 1. A prebuilt https://rocksdb-deps.s3-us-west-2.amazonaws.com/cmake/cmake-3.14.5-Linux-s390x.deb package 2. A https://hub.docker.com/r/evolvedbinary/rocksjava s390x image Not sure if there is more required for travis. Happy to help in any way I can. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8962 Reviewed By: mrambacher Differential Revision: D31802198 Pulled By: pdillinger fbshipit-source-id: 683511466fa6b505f85ba5a9964a268c6151f0c2
-
- 19 10月, 2021 1 次提交
-
-
由 Peter Dillinger 提交于
Summary: * New public header unique_id.h and function GetUniqueIdFromTableProperties which computes a universally unique identifier based on table properties of table files from recent RocksDB versions. * Generation of DB session IDs is refactored so that they are guaranteed unique in the lifetime of a process running RocksDB. (SemiStructuredUniqueIdGen, new test included.) Along with file numbers, this enables SST unique IDs to be guaranteed unique among SSTs generated in a single process, and "better than random" between processes. See https://github.com/pdillinger/unique_id * In addition to public API producing 'external' unique IDs, there is a function for producing 'internal' unique IDs, with functions for converting between the two. In short, the external ID is "safe" for things people might do with it, and the internal ID enables more "power user" features for the future. Specifically, the external ID goes through a hashing layer so that any subset of bits in the external ID can be used as a hash of the full ID, while also preserving uniqueness guarantees in the first 128 bits (bijective both on first 128 bits and on full 192 bits). Intended follow-up: * Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into the third 64-bit value of the unique ID.) * Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990 Test Plan: Unit tests added, and checking of unique ids in stress test. NOTE in stress test we do not generate nearly enough files to thoroughly stress uniqueness, but the test trims off pieces of the ID to check for uniqueness so that we can infer (with some assumptions) stronger properties in the aggregate. Reviewed By: zhichao-cao, mrambacher Differential Revision: D31582865 Pulled By: pdillinger fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
-
- 08 10月, 2021 1 次提交
-
-
由 Zhichao Cao 提交于
Summary: Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB. Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8912 Test Plan: add new tests to lru_cache_test and pass make check. Reviewed By: pdillinger Differential Revision: D31452871 Pulled By: zhichao-cao fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48
-
- 28 9月, 2021 1 次提交
-
-
由 mrambacher 提交于
Make WalFilter, SstPartitionerFactory, FileChecksumGenFactory, and TableProperties Customizable (#8638) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8638 Reviewed By: zhichao-cao Differential Revision: D31024729 Pulled By: mrambacher fbshipit-source-id: 954c04ccab0b8dee64050a27aadf78ed119106c0
-
- 08 9月, 2021 1 次提交
-
-
由 Peter Dillinger 提交于
Summary: * Consolidate use of std::regex for testing to testharness.cc, to minimize Facebook linters constantly flagging uses in non-production code. * Improve syntax and error messages for asserting some string matches a regex in tests. * Add a public Regex wrapper class to encapsulate existing usage in ObjectRegistry. * Remove unnecessary include <regex> * Put warnings that use of Regex in production code could cause bad performance or stack overflow. Intended follow-up work: * Replace std::regex with another underlying implementation like RE2 * Improve ObjectRegistry interface in terms of possibly confusing literal string matching vs. regex and in terms of reporting invalid regex. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8740 Test Plan: tests updated, basic unit test for public Regex, and some manual testing of temporary changes to see example error messages: utilities/backupable/backupable_db_test.cc:917: Failure 000010_1162373755_138626.blob (child.name) does not match regex [0-9]+_[0-9]+_[0-9]+[.]blobHAHAHA (pattern) db/db_basic_test.cc:74: Failure R3SHSBA8C4U0CIMV2ZB0 (sid3) does not match regex [0-9A-Z]{20}HAHAHA Reviewed By: mrambacher Differential Revision: D30706246 Pulled By: pdillinger fbshipit-source-id: ba845e8f563ccad39bdb58f44f04e9da8f78c3fd
-
- 31 8月, 2021 1 次提交
-
-
由 Peter Dillinger 提交于
Summary: Env::GenerateUniqueId() works fine on Windows and on POSIX where /proc/sys/kernel/random/uuid exists. Our other implementation is flawed and easily produces collision in a new multi-threaded test. As we rely more heavily on DB session ID uniqueness, this becomes a serious issue. This change combines several individually suitable entropy sources for reliable generation of random unique IDs, with goal of uniqueness and portability, not cryptographic strength nor maximum speed. Specifically: * Moves code for getting UUIDs from the OS to port::GenerateRfcUuid rather than in Env implementation details. Callers are now told whether the operation fails or succeeds. * Adds an internal API GenerateRawUniqueId for generating high-quality 128-bit unique identifiers, by combining entropy from three "tracks": * Lots of info from default Env like time, process id, and hostname. * std::random_device * port::GenerateRfcUuid (when working) * Built-in implementations of Env::GenerateUniqueId() will now always produce an RFC 4122 UUID string, either from platform-specific API or by converting the output of GenerateRawUniqueId. DB session IDs now use GenerateRawUniqueId while DB IDs (not as critical) try to use port::GenerateRfcUuid but fall back on GenerateRawUniqueId with conversion to an RFC 4122 UUID. GenerateRawUniqueId is declared and defined under env/ rather than util/ or even port/ because of the Env dependency. Likely follow-up: enhance GenerateRawUniqueId to be faster after the first call and to guarantee uniqueness within the lifetime of a single process (imparting the same property onto DB session IDs). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708 Test Plan: A new mini-stress test in env_test checks the various public and internal APIs for uniqueness, including each track of GenerateRawUniqueId individually. We can't hope to verify anywhere close to 128 bits of entropy, but it can at least detect flaws as bad as the old code. Serial execution of the new tests takes about 350 ms on my machine. Reviewed By: zhichao-cao, mrambacher Differential Revision: D30563780 Pulled By: pdillinger fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
-
- 25 8月, 2021 1 次提交
-
-
由 Hui Xiao 提交于
Summary: Context: To help cap various memory usage by a single limit of the block cache capacity, we charge the memory usage through inserting/releasing dummy entries in the block cache. CacheReservationManager is such a class (non thread-safe) responsible for inserting/removing dummy entries to reserve cache space for memory used by the class user. - Refactored the inner private class CacheRep of WriteBufferManager into public CacheReservationManager class for reusability such as for https://github.com/facebook/rocksdb/pull/8428 - Encapsulated implementation details of cache key generation and dummy entries insertion/release in cache reservation as discussed in https://github.com/facebook/rocksdb/pull/8506#discussion_r666550838 - Consolidated increase/decrease cache reservation into one API - UpdateCacheReservation. - Adjusted the previous dummy entry release algorithm in decreasing cache reservation to be loop-releasing dummy entries to stay symmetric to dummy entry insertion algorithm - Made the previous dummy entry release algorithm in delayed decrease mode more aggressive for better decreasing cache reservation when memory used is less likely to increase back. Previously, the algorithms only release 1 dummy entries when new_mem_used < 3/4 * cache_allocated_size_ and cache_allocated_size_ - kSizeDummyEntry > new_mem_used. Now, the algorithms loop-releases as many dummy entries as possible when new_mem_used < 3/4 * cache_allocated_size_. - Updated WriteBufferManager's test cases to adapt to changes on the release algorithm mentioned above and left comment for some test cases for clarity - Replaced the previous cache key prefix generation (utilizing object address related to the cache client) with one that utilizes Cache->NewID() to prevent cache-key collision among dummy entry clients sharing the same cache. The specific collision we are preventing happens when the object address is reused for a new cache-key prefix while the old cache-key using that same object address in its prefix still exists in the cache. This could happen due to that, under LRU cache policy, there is a possible delay in releasing a cache entry after the cache client object owning that cache entry get deallocated. In this case, the object address related to the cache client object can get reused for other client object to generate a new cache-key prefix. This prefix generation can be made obsolete after Peter's unification of all the code generating cache key, mentioned in https://github.com/facebook/rocksdb/pull/8506#discussion_r667265255 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8506 Test Plan: - Passing the added unit tests cache_reservation_manager_test.cc - Passing existing and adjusted write_buffer_manager_test.cc Reviewed By: ajkr Differential Revision: D29644135 Pulled By: hx235 fbshipit-source-id: 0fc93fbfe4a40bb41be85c314f8f2bafa8b741f7
-
- 19 8月, 2021 1 次提交
-
-
由 Merlin Mao 提交于
Summary: `Replayer::Execute()` can directly returns the result (e.g, request latency, DB::Get() return code, returned value, etc.) `Replayer::Replay()` reports the results via a callback function. New interface: `TraceRecordResult` in "rocksdb/trace_record_result.h". `DBTest2.TraceAndReplay` and `DBTest2.TraceAndManualReplay` are updated accordingly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8657 Reviewed By: ajkr Differential Revision: D30290216 Pulled By: autopear fbshipit-source-id: 3c8d4e6b180ec743de1a9d9dcaee86064c74f0d6
-
- 12 8月, 2021 1 次提交
-
-
由 Merlin Mao 提交于
Summary: New public interfaces: `TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h". `Replayer`, available in `rocksdb/utilities/replayer.h`. User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file. Unit tests: - `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes. - `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611 Reviewed By: ajkr Differential Revision: D30266329 Pulled By: autopear fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8
-
- 06 8月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: - Changed MergeOperator, CompactionFilter, and CompactionFilterFactory into Customizable classes. - Added Options/Configurable/Object Registration for TTL and Cassandra variants - Changed the StringAppend MergeOperators to accept a string delimiter rather than a simple char. Made the delimiter into a configurable option - Added tests for new functionality Pull Request resolved: https://github.com/facebook/rocksdb/pull/8481 Reviewed By: zhichao-cao Differential Revision: D30136050 Pulled By: mrambacher fbshipit-source-id: 271d1772835935b6773abaf018ee71e42f9491af
-
- 09 7月, 2021 1 次提交
-
-
由 Jay Zhuang 提交于
Summary: Add google benchmark for microbench. Add ribbon_bench for benchmark ribbon filter vs. other filters. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8493 Test Plan: added test to CI To run the benchmark on devhost: Install benchmark: `$ sudo dnf install google-benchmark-devel` Build and run: `$ ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make microbench` or with cmake: `$ mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_BENCHMARK=1 && make microbench` Reviewed By: pdillinger Differential Revision: D29589649 Pulled By: jay-zhuang fbshipit-source-id: 8fed13b562bef4472f161ecacec1ab6b18911dff
-
- 29 6月, 2021 1 次提交
-
-
由 Lucian Petrut 提交于
Summary: Allow using WindowsThread with Mingw Most Mingw builds require Posix threads in order to use std::thread. As per https://github.com/facebook/rocksdb/issues/7764, this is not always the case. That being considered, we're going to improve the Mingw thread model checks. Closes: https://github.com/facebook/rocksdb/issues/7764Signed-off-by: NLucian Petrut <lpetrut@cloudbasesolutions.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/8108 Reviewed By: jay-zhuang Differential Revision: D27365778 Pulled By: mrambacher fbshipit-source-id: 2c15b1f04ae90e1e3a25a33e86ceb779224a9529
-
- 24 6月, 2021 1 次提交
-
-
由 Levi Tamasi 提交于
Summary: Follow-up to https://github.com/facebook/rocksdb/issues/8426 . The patch adds a new kind of `InternalIterator` that wraps another one and passes each key-value encountered to `BlobGarbageMeter` as inflow. This iterator will be used as an input iterator for compactions when the input SSTs reference blob files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8443 Test Plan: `make check` Reviewed By: jay-zhuang Differential Revision: D29311987 Pulled By: ltamasi fbshipit-source-id: b4493b4c0c0c2e3c2ecc33c8969a5ef02de5d9d8
-
- 22 6月, 2021 1 次提交
-
-
由 Levi Tamasi 提交于
Summary: This is part of an alternative approach to https://github.com/facebook/rocksdb/issues/8316. Unlike that approach, this one relies on key-values getting processed one by one during compaction, and does not involve persistence. Specifically, the patch adds a class `BlobGarbageMeter` that can track the number and total size of blobs in a (sub)compaction's input and output on a per-blob file basis. This information can then be used to compute the amount of additional garbage generated by the compaction for any given blob file by subtracting the "outflow" from the "inflow." Note: this patch only adds `BlobGarbageMeter` and associated unit tests. I plan to hook up this class to the input and output of `CompactionIterator` in a subsequent PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8426 Test Plan: `make check` Reviewed By: jay-zhuang Differential Revision: D29242250 Pulled By: ltamasi fbshipit-source-id: 597e50ad556540e413a50e804ba15bc044d809bb
-
- 15 6月, 2021 1 次提交
-
-
由 Jay Zhuang 提交于
Summary: cmake test discovery may timeout especially on Windows platform. Increase it from default 5 seconds to 120 seconds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8403 Test Plan: Run Windows build 10 times without issue Reviewed By: akankshamahajan15 Differential Revision: D29117455 Pulled By: jay-zhuang fbshipit-source-id: 74f71833432f016776a59e070b0f4e146968f81b
-
- 11 6月, 2021 1 次提交
-
-
由 Akanksha Mahajan 提交于
Summary: This PR add support for Merge operation in Integrated BlobDB with base values(i.e DB::Put). Merged values can be retrieved through DB::Get, DB::MultiGet, DB::GetMergeOperands and Iterator operation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8292 Test Plan: Add new unit tests Reviewed By: ltamasi Differential Revision: D28415896 Pulled By: akankshamahajan15 fbshipit-source-id: e9b3478bef51d2f214fb88c31ed3c8d2f4a531ff
-
- 10 6月, 2021 1 次提交
-
-
由 Levi Tamasi 提交于
Summary: Logically, subcompactions process a key range [start, end); however, the way this is currently implemented is that the `CompactionIterator` for any given subcompaction keeps processing key-values until it actually outputs a key that is out of range, which is then discarded. Instead of doing this, the patch introduces a new type of internal iterator called `ClippingIterator` which wraps another internal iterator and "clips" its range of key-values so that any KVs returned are strictly in the [start, end) interval. This does eliminate a (minor) inefficiency by stopping processing in subcompactions exactly at the limit; however, the main motivation is related to BlobDB: namely, we need this to be able to measure the amount of garbage generated by a subcompaction precisely and prevent off-by-one errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8327 Test Plan: `make check` Reviewed By: siying Differential Revision: D28761541 Pulled By: ltamasi fbshipit-source-id: ee0e7229f04edabbc7bed5adb51771fbdc287f69
-
- 02 6月, 2021 1 次提交
-
-
由 Jay Zhuang 提交于
Summary: - Fix cmake build failure with gflags. - Add CI tests for both gflags 2.1 and 2.2. - Fix ctest config with gtest. - Add CI to run test with ctest. One benefit of ctest is it support timeout, it's set to 5min in our CI, so we will know which test is hang. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8324 Test Plan: CI pass Reviewed By: ajkr Differential Revision: D28762517 Pulled By: jay-zhuang fbshipit-source-id: 09063c5af5f9f33abfcdeb48593acbd9826cd199
-
- 22 5月, 2021 1 次提交
-
-
由 sdong 提交于
Summary: By default, try to build with liburing. For make, if ROCKSDB_USE_IO_URING is not set, treat as 1, which means RocksDB will try to build with liburing. For cmake, add WITH_LIBURING to control it, with default on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8322 Test Plan: Build using cmake and make. Reviewed By: anand1976 Differential Revision: D28586498 fbshipit-source-id: cfd39159ab697f4b93a9293a59c07f839b1e7ed5
-
- 20 5月, 2021 3 次提交
-
-
由 Jay Zhuang 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8300 Reviewed By: ajkr Differential Revision: D28464726 Pulled By: jay-zhuang fbshipit-source-id: 49e9f4fb791808a6cbf39a7b1a331373f645fc5e
-
由 Peter Dillinger 提交于
Summary: This change gathers and publishes statistics about the kinds of items in block cache. This is especially important for profiling relative usage of cache by index vs. filter vs. data blocks. It works by iterating over the cache during periodic stats dump (InternalStats, stats_dump_period_sec) or on demand when DB::Get(Map)Property(kBlockCacheEntryStats), except that for efficiency and sharing among column families, saved data from the last scan is used when the data is not considered too old. The new information can be seen in info LOG, for example: Block cache LRUCache@0x7fca62229330 capacity: 95.37 MB collections: 8 last_copies: 0 last_secs: 0.00178 secs_since: 0 Block cache entry stats(count,size,portion): DataBlock(7092,28.24 MB,29.6136%) FilterBlock(215,867.90 KB,0.888728%) FilterMetaBlock(2,5.31 KB,0.00544%) IndexBlock(217,180.11 KB,0.184432%) WriteBuffer(1,256.00 KB,0.262144%) Misc(1,0.00 KB,0%) And also through DB::GetProperty and GetMapProperty (here using ldb just for demonstration): $ ./ldb --db=/dev/shm/dbbench/ get_property rocksdb.block-cache-entry-stats rocksdb.block-cache-entry-stats.bytes.data-block: 0 rocksdb.block-cache-entry-stats.bytes.deprecated-filter-block: 0 rocksdb.block-cache-entry-stats.bytes.filter-block: 0 rocksdb.block-cache-entry-stats.bytes.filter-meta-block: 0 rocksdb.block-cache-entry-stats.bytes.index-block: 178992 rocksdb.block-cache-entry-stats.bytes.misc: 0 rocksdb.block-cache-entry-stats.bytes.other-block: 0 rocksdb.block-cache-entry-stats.bytes.write-buffer: 0 rocksdb.block-cache-entry-stats.capacity: 8388608 rocksdb.block-cache-entry-stats.count.data-block: 0 rocksdb.block-cache-entry-stats.count.deprecated-filter-block: 0 rocksdb.block-cache-entry-stats.count.filter-block: 0 rocksdb.block-cache-entry-stats.count.filter-meta-block: 0 rocksdb.block-cache-entry-stats.count.index-block: 215 rocksdb.block-cache-entry-stats.count.misc: 1 rocksdb.block-cache-entry-stats.count.other-block: 0 rocksdb.block-cache-entry-stats.count.write-buffer: 0 rocksdb.block-cache-entry-stats.id: LRUCache@0x7f3636661290 rocksdb.block-cache-entry-stats.percent.data-block: 0.000000 rocksdb.block-cache-entry-stats.percent.deprecated-filter-block: 0.000000 rocksdb.block-cache-entry-stats.percent.filter-block: 0.000000 rocksdb.block-cache-entry-stats.percent.filter-meta-block: 0.000000 rocksdb.block-cache-entry-stats.percent.index-block: 2.133751 rocksdb.block-cache-entry-stats.percent.misc: 0.000000 rocksdb.block-cache-entry-stats.percent.other-block: 0.000000 rocksdb.block-cache-entry-stats.percent.write-buffer: 0.000000 rocksdb.block-cache-entry-stats.secs_for_last_collection: 0.000052 rocksdb.block-cache-entry-stats.secs_since_last_collection: 0 Solution detail - We need some way to flag what kind of blocks each entry belongs to, preferably without changing the Cache API. One of the complications is that Cache is a general interface that could have other users that don't adhere to whichever convention we decide on for keys and values. Or we would pay for an extra field in the Handle that would only be used for this purpose. This change uses a back-door approach, the deleter, to indicate the "role" of a Cache entry (in addition to the value type, implicitly). This has the added benefit of ensuring proper code origin whenever we recognize a particular role for a cache entry; if the entry came from some other part of the code, it will use an unrecognized deleter, which we simply attribute to the "Misc" role. An internal API makes for simple instantiation and automatic registration of Cache deleters for a given value type and "role". Another internal API, CacheEntryStatsCollector, solves the problem of caching the results of a scan and sharing them, to ensure scans are neither excessive nor redundant so as not to harm Cache performance. Because code is added to BlocklikeTraits, it is pulled out of block_based_table_reader.cc into its own file. This is a reformulation of https://github.com/facebook/rocksdb/issues/8276, without the type checking option (could still be added), and with actual stat gathering. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8297 Test Plan: manual testing with db_bench, and a couple of basic unit tests Reviewed By: ltamasi Differential Revision: D28488721 Pulled By: pdillinger fbshipit-source-id: 472f524a9691b5afb107934be2d41d84f2b129fb
-
由 anand76 提交于
Summary: This PR adds a ```-secondary_cache_uri``` option to the cache_bench and db_bench tools to allow the user to specify a custom secondary cache URI. The object registry is used to create an instance of the ```SecondaryCache``` object of the type specified in the URI. The main cache_bench code is packaged into a separate library, similar to db_bench. An example invocation of db_bench with a secondary cache URI - ```db_bench --env_uri=ws://ws.flash_sandbox.vll1_2/ -db=anand/nvm_cache_2 -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=67108864 -cache_index_and_filter_blocks=true -secondary_cache_uri='cachelibwrapper://filename=/home/anand76/nvm_cache/cache_file;size=2147483648;regionSize=16777216;admPolicy=random;admProbability=1.0;volatileSize=8388608;bktPower=20;lockPower=12' -partition_index_and_filters=true -duration=1800``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8312 Reviewed By: zhichao-cao Differential Revision: D28544325 Pulled By: anand1976 fbshipit-source-id: 8f209b9af900c459dc42daa7a610d5f00176eeed
-
- 13 5月, 2021 1 次提交
-
-
由 Jay Zhuang 提交于
Summary: And change the cmake build on macos with GFLAGS on to cover more cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8289 Reviewed By: zhichao-cao Differential Revision: D28372467 Pulled By: jay-zhuang fbshipit-source-id: ad7fbe523c3fb135ef5281adbaf2070ca5d0873d
-
- 22 4月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: For some compilers/environments (e.g. Clang, riscv64), we need to link against -latomic. Check if this is a requirement and add the library to the third-party libs if it is. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8183 Reviewed By: pdillinger Differential Revision: D27773564 Pulled By: mrambacher fbshipit-source-id: 68e15d823144f83fb02221c7bf5b1e43323419bf
-
- 13 4月, 2021 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: Before this PR, `get_iostats_context()` will silently return a nullptr if no thread_local support is detected. This can be the result of build_detect_platform's failure to compile the simple code snippet on certain platforms, as reported in https://github.com/facebook/mysql-5.6/issues/904. To be safe, we should fail the compilation if user does not opt out IOStatsContext and ROCKSDB_SUPPORT_THREAD_LOCAL is not defined. If RocksDB relies on c++11, can we just always use thread_local? It turns out there might be performance concerns (https://github.com/facebook/rocksdb/issues/5774), which is beyond the scope of this PR. We can revisit this later. Here, we stick to the original impl. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8117 Reviewed By: ajkr Differential Revision: D27356847 Pulled By: riversand963 fbshipit-source-id: f7d5776842277598d8341b955febb601946801ae
-
- 07 4月, 2021 1 次提交
-
-
由 Peter Dillinger 提交于
Summary: A current limitation of backups is that you don't know the exact database state of when the backup was taken. With this new feature, you can at least inspect the backup's DB state without restoring it by opening it as a read-only DB. Rather than add something like OpenAsReadOnlyDB to the BackupEngine API, which would inhibit opening stackable DB implementations read-only (if/when their APIs support it), we instead provide a DB name and Env that can be used to open as a read-only DB. Possible follow-up work: * Add a version of GetBackupInfo for a single backup. * Let CreateNewBackup return the BackupID of the newly-created backup. Implementation details: Refactored ChrootFileSystem to split off new base class RemapFileSystem, which allows more general remapping of files. We use this base class to implement BackupEngineImpl::RemapSharedFileSystem. To minimize API impact, I decided to just add these fields `name_for_open` and `env_for_open` to those set by GetBackupInfo when include_file_details=true. Creating the RemapSharedFileSystem adds a bit to the memory consumption, perhaps unnecessarily in some cases, but this has been mitigated by (a) only initialize the RemapSharedFileSystem lazily when GetBackupInfo with include_file_details=true is called, and (b) using the existing `shared_ptr<FileInfo>` objects to hold most of the mapping data. To enhance API safety, RemapSharedFileSystem is wrapped by new ReadOnlyFileSystem which rejects any attempts to write. This uncovered a couple of places in which DB::OpenForReadOnly would write to the filesystem, so I fixed these. Added a release note because this affects logging. Additional minor refactoring in backupable_db.cc to support the new functionality. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8142 Test Plan: new test (run with ASAN and UBSAN), added to stress test and ran it for a while with amplified backup_one_in Reviewed By: ajkr Differential Revision: D27535408 Pulled By: pdillinger fbshipit-source-id: 04666d310aa0261ef6b2385c43ca793ce1dfd148
-
- 24 3月, 2021 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: As title. All core db implementations should stay in db_impl. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8082 Test Plan: make check Reviewed By: ajkr Differential Revision: D27211442 Pulled By: riversand963 fbshipit-source-id: e0953fde75064740e899aaff7989ff033b7f5232
-
- 16 3月, 2021 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8054 Test Plan: make check Reviewed By: mrambacher Differential Revision: D27017955 Pulled By: riversand963 fbshipit-source-id: 829497d507bc89afbe982f8a8cf3555e52fd7098
-
- 10 3月, 2021 3 次提交
-
-
由 Ed rodriguez 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7994 Reviewed By: jay-zhuang Differential Revision: D26904603 Pulled By: ajkr fbshipit-source-id: 0af92a51de895b40c7faaa4f0870b3f63279fe21
-
由 Peter Dillinger 提交于
Summary: Removed confusing, awkward, and undocumented internal API ReadOneLine and replaced with very simple LineFileReader. In refactoring backupable_db.cc, this has the side benefit of removing the arbitrary cap on the size of backup metadata files. Also added Status::MustCheck to make it easy to mark a Status as "must check." Using this, I can ensure that after LineFileReader::ReadLine returns false the caller checks GetStatus(). Also removed some excessive conditional compilation in status.h Pull Request resolved: https://github.com/facebook/rocksdb/pull/8026 Test Plan: added unit test, and running tests with ASSERT_STATUS_CHECKED Reviewed By: mrambacher Differential Revision: D26831687 Pulled By: pdillinger fbshipit-source-id: ef749c265a7a26bb13cd44f6f0f97db2955f6f0f
-
由 xinyuliu 提交于
Summary: Add ${ARTIFACT_SUFFIX} to benchmark tool names to enable differentiating jemalloc and non-jemalloc versions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8016 Reviewed By: jay-zhuang Differential Revision: D26907007 Pulled By: ajkr fbshipit-source-id: 78d3b3372b5454d52d5b663ea982135ea9cf7bf8
-
- 27 2月, 2021 1 次提交
-
-
由 Peter Dillinger 提交于
Summary: This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings. This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing). Homogenous Ribbon: This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate. Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit. Other misc item specifics: * Ribbon APIs in util/ribbon_config.h now provide configuration data for not just 5% construction failure rate (95% success), but also 50% and 0.1%. * Note that the Ribbon structure does not exhibit "threshold" behavior as standard Xor filter does, so there is a roughly fixed space penalty to cut construction failure rate in half. Thus, there isn't really an "almost sure" setting. * Although we can extrapolate settings for large filters, we don't have a good formula for configuring smaller filters (< 2^17 slots or so), and efforts to summarize with a formula have failed. Thus, small data is hard-coded from updated FindOccupancy tool. * Enhances ApproximateNumEntries for public API Ribbon using more precise data (new API GetNumToAdd), thus a more accurate but not perfect reversal of CalculateSpace. (bloom_test updated to expect the greater precision) * Move EndianSwapValue from coding.h to coding_lean.h to keep Ribbon code easily transferable from RocksDB * Add some missing 'const' to member functions * Small optimization to 128-bit BitParity * Small refactoring of BandingStorage in ribbon_alg.h to support Homogeneous Ribbon * CompactnessAndBacktrackAndFpRate now has an "expand" test: on construction failure, a possible alternative to re-seeding hash functions is simply to increase the number of slots (allocated space overhead) and try again with essentially the same hash values. (Start locations will be different roundings of the same scaled hash values--because fastrange not mod.) This seems to be as effective or more effective than re-seeding, as long as we increase the number of slots (m) by roughly m += m/w where w is the Ribbon width. This way, there is effectively an expansion by one slot for each ribbon-width window in the banding. (This approach assumes that getting "bad data" from your hash function is as unlikely as it naturally should be, e.g. no adversary.) * 32-bit and 16-bit Ribbon configurations are added to ribbon_test for understanding their behavior, e.g. with FindOccupancy. They are not considered useful at this time and not tested with CompactnessAndBacktrackAndFpRate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7879 Test Plan: unit test updates included Reviewed By: jay-zhuang Differential Revision: D26371245 Pulled By: pdillinger fbshipit-source-id: da6600d90a3785b99ad17a88b2a3027710b4ea3a
-
- 26 2月, 2021 1 次提交
-
-
由 Yanqin Jin 提交于
Summary: Allow applications to implement a custom compaction filter and pass it to BlobDB. The compaction filter's custom logic can operate on blobs. To do so, application needs to subclass `CompactionFilter` abstract class and implement `FilterV2()` method. Optionally, a method called `ShouldFilterBlobByKey()` can be implemented if application's custom logic rely solely on the key to make a decision without reading the blob, thus saving extra IO. Examples can be found in db/blob/db_blob_compaction_test.cc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7974 Test Plan: make check Reviewed By: ltamasi Differential Revision: D26509280 Pulled By: riversand963 fbshipit-source-id: 59f9ae5614c4359de32f4f2b16684193cc537b39
-
- 24 2月, 2021 1 次提交
-
-
由 sherriiiliu 提交于
Summary: WITH_GFLAGS option does not work on MSVC. I checked the usage of [CMAKE_DEPENDENT_OPTION](https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html). It says if the `depends` condition is not true, it will set the `option` to the value given by `force` and hides the option from the user. Therefore, `CMAKE_DEPENDENT_OPTION(WITH_GFLAGS "build with GFlags" ON "NOT MSVC;NOT MINGW" OFF)` will hide WITH_GFLAGS option from user if it is running on MSVC or MINGW and always set WITH_GFLAGS to be OFF. To expose WITH_GFLAGS option to user, I removed CMAKE_DEPENDENT_OPTION and split the logic into if-else statements Pull Request resolved: https://github.com/facebook/rocksdb/pull/7990 Reviewed By: jay-zhuang Differential Revision: D26615755 Pulled By: ajkr fbshipit-source-id: 33ca39a73423d9516510c15aaf9efb5c4072cdf9
-
- 23 2月, 2021 1 次提交
-
-
由 Akanksha Mahajan 提交于
Summary: Extend VerifyFileChecksums API to verify blob files in case of use_file_checksum. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7979 Test Plan: New unit test db_blob_corruption_test Reviewed By: ltamasi Differential Revision: D26534040 Pulled By: akankshamahajan15 fbshipit-source-id: 7dc5951a3df9d265ea1265e0122b43c966856ade
-
- 11 2月, 2021 1 次提交
-
-
由 Xavier Deguillard 提交于
Summary: With M1 macs being available, it is possible that RocksDB will be built on them, without the resulting artifacts to be intended for iOS, where a non-lite RocksDB is needed. It is not clear to me why the ROCKSDB_LITE cmake option isn't used for iOS consumer, so sending this pull request as a way to foster discussion and to find a path forward to get a full RocksDB build on M1. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7943 Test Plan: Applied the following patch: ``` diff --git a/fbcode/opensource/fbcode_builder/manifests/rocksdb b/fbcode/opensource/fbcode_builder/manifests/rocksdb --- a/fbcode/opensource/fbcode_builder/manifests/rocksdb +++ b/fbcode/opensource/fbcode_builder/manifests/rocksdb @@ -2,8 +2,8 @@ name = rocksdb [download] -url = https://github.com/facebook/rocksdb/archive/v6.8.1.tar.gz -sha256 = ca192a06ed3bcb9f09060add7e9d0daee1ae7a8705a3d5ecbe41867c5e2796a2 +url = https://github.com/xavierd/rocksdb/archive/master.zip +sha256 = f93f3f92df66a8401659e35398749d5910b92bd9c14b8354a35ea8852865c422 [dependencies] lz4 @@ -11,7 +11,7 @@ [build] builder = cmake -subdir = rocksdb-6.8.1 +subdir = rocksdb-master [cmake.defines] WITH_SNAPPY=ON ``` And ran `getdeps build eden` on an M1 macbook. The build used to fail at link time due to some RocksDB symbols not being found, it now fails for another reason (x86_64 Rust symbols). Reviewed By: jay-zhuang Differential Revision: D26324049 Pulled By: xavierd fbshipit-source-id: 12d86f3395709c4c323f440844e3ae65672aef2d
-
- 02 2月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: (Fixes a regression introduced in the build_version generation PR https://github.com/facebook/rocksdb/issues/7866 ) In the Makefile case, needed to ignore stderr on the tag (everywhere else was fine). In the CMAKE case, no GIT implies "changes" so that we use the system date rather than the empty GIT date. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7916 Test Plan: Built in a tree that did not contain the ".git" directory. Validated that no errors appeared during the build process and that the build version date was not empty. Reviewed By: jay-zhuang Differential Revision: D26169203 Pulled By: mrambacher fbshipit-source-id: 3288a23b48d97efed5e5b38c9aefb3ef1153fa16
-
- 30 1月, 2021 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.). The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer. When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748 Test Plan: - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption - [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc. Reviewed By: pdillinger Differential Revision: D25754492 Pulled By: ajkr fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
-
- 29 1月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: Closes https://github.com/facebook/rocksdb/issues/7035 Changed how build_version.cc was generated: - Included the GIT tag/branch in the build_version file - Changed the "Build Date" to be: - If the GIT branch is "clean" (no changes), the date of the last git commit - If the branch is not clean, the current date - Added APIs to access the "build information", rather than accessing the strings directly. The build_version.cc file is now regenerated whenever the library objects are rebuilt. Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version Pull Request resolved: https://github.com/facebook/rocksdb/pull/7866 Reviewed By: pdillinger Differential Revision: D26086565 Pulled By: mrambacher fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f
-
- 26 1月, 2021 1 次提交
-
-
由 mrambacher 提交于
Summary: Introduces and uses a SystemClock class to RocksDB. This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock. Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead. There are likely more places that can be changed, but this is a start to show what can/should be done. Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock. There are several Env classes that implement these functions. Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR. It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc). Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858 Reviewed By: pdillinger Differential Revision: D26006406 Pulled By: mrambacher fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
-