- 25 8月, 2018 5 次提交
-
-
由 Yanqin Jin 提交于
Summary: According to https://github.com/facebook/rocksdb/blob/4848bd0c4e98713bf5ae72a36057e188c53206f8/db/log_reader.cc#L355, the original text is misleading when describing the layout of RecyclableLogHeader. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4315 Differential Revision: D9505284 Pulled By: riversand963 fbshipit-source-id: 79994c37a69e7003f03453e7efc0186feeafa609
-
由 Shrikanth Shankar 提交于
Summary: This PR fixes issue 3842. We drop deletion markers iff 1. We are the bottom most level AND 2. All other occurrences of the key are in the same snapshot range as the delete I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far) make check -j64 db_stress db_stress --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */ ./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */ Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289 Differential Revision: D9491165 Pulled By: shrikanthshankar fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2
-
由 Yanqin Jin 提交于
Summary: Test plan ``` $cd rocksdb/ $./tools/check_format_compatible.sh ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4310 Differential Revision: D9498125 Pulled By: riversand963 fbshipit-source-id: 83cf6992949a52199e7812bb41bc9281ac271a24
-
由 Yanqin Jin 提交于
Summary: RocksDB currently queues individual column family for flushing. This is not sufficient to support the needs of some applications that want to enforce order/dependency between column families, given that multiple foreground and background activities can trigger flushing in RocksDB. This PR aims to address this limitation. Each flush request is described as a `FlushRequest` that can contain multiple column families. A background flushing thread pops one flush request from the queue at a time and processes it. This PR does not enable atomic_flush yet, but is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752). Pull Request resolved: https://github.com/facebook/rocksdb/pull/3952 Differential Revision: D8529933 Pulled By: riversand963 fbshipit-source-id: 78908a21e389a3a3f7de2a79bae0cd13af5f3539
-
由 Andrew Kryczka 提交于
Summary: I have a PR to start calling `OnTableFileCreated` for empty SSTs: #4307. However, it is a behavior change so should not go into a patch release. This PR adds back a check to make sure range deletions at least exist before starting file creation. This PR should be safe to backport to earlier versions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4311 Differential Revision: D9493734 Pulled By: ajkr fbshipit-source-id: f0d43cda4cfd904f133cfe3a6eb622f52a9ccbe8
-
- 24 8月, 2018 7 次提交
-
-
由 Andrew Kryczka 提交于
Summary: Blame: #4307 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312 Differential Revision: D9494093 Pulled By: ajkr fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
-
由 Andrew Kryczka 提交于
Summary: In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file. ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary. There are a couple other changes included in this PR: - Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it. - Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251 Differential Revision: D9257078 Pulled By: ajkr fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
-
由 Andrew Kryczka 提交于
Summary: The API comment on `OnTableFileCreationStarted` (https://github.com/facebook/rocksdb/blob/b6280d01f9f9c4305c536dfb804775fce3956280/include/rocksdb/listener.h#L331-L333) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data. This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307 Differential Revision: D9485201 Pulled By: ajkr fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
-
由 zhichao-cao 提交于
Summary: Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282 Differential Revision: D9436758 Pulled By: zhichao-cao fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
-
由 Gauresh Rane 提交于
Summary: Memtables are selected for flushing by the flush job. Currently we have listener which is invoked when memtables for a column family are flushed. That listener does not indicate which memtable was flushed in the notification. If clients want to know if particular data in the memtable was retired, there is no straight forward way to know this. This method will help users who implement memtablerep factory and extend interface for memtablerep, to know if the data in the memtable was retired. Another option that was tried, was to depend on memtable destructor to be called after flush to mark that data was persisted. This works all the time but sometimes there can huge delays between actual flush happening and memtable getting destroyed. Hence, if anyone who is waiting for data to persist will have to wait that longer. It is expected that anyone who is implementing this method to have return quickly as it blocks RocksDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4304 Reviewed By: riversand963 Differential Revision: D9472312 Pulled By: gdrane fbshipit-source-id: 8e693308dee749586af3a4c5d4fcf1fa5276ea4d
-
由 Fenggang Wu 提交于
Summary: When returning `kNoEntry` from HashIndex lookup, previously we invalidate the `biter` by set `current_=restarts_`, so that the search can continue to the next block in case the search result may reside in the next block. There is one problem: when we are searching for a missing key, if the search finds a `kNoEntry` and continue the search to the next block, there is also a non-trivial possibility that the HashIndex return `kNoEntry` too, and the expensive index iterator `Next()` will happen several times for nothing. The solution is that if the hash table returns `kNoEntry`, `SeekForGetImpl()` just search the last restart interval for the key. It will stop at the first key that is large than the seek_key, or to the end of the block, and each case will be handled correctly. Microbenchmark script: ``` TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq,readtocache,readmissing \ --cache_size=20000000000 --use_data_block_hash_index={true|false} ``` `readmissing` performance (lower is better): ``` binary: 3.6098 micros/op hash (before applying diff): 4.1048 micros/op hash (after applying diff): 3.3502 micros/op ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4296 Differential Revision: D9419159 Pulled By: fgwu fbshipit-source-id: 21e3eedcccbc47a249aa8eb4bf405c9def0b8a05
-
由 Yanqin Jin 提交于
Summary: We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039 Differential Revision: D8670178 Pulled By: riversand963 fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
-
- 23 8月, 2018 2 次提交
-
-
由 Zhongyi Xie 提交于
Summary: User reported (https://github.com/facebook/rocksdb/issues/4168) that when opening RocksDB in read-only mode, some statistics are not correctly reported. After some investigation, we believe the following counters are indeed not reported during Get() call in a read-only DB: rocksdb.memtable.hit rocksdb.memtable.miss rocksdb.number.keys.read rocksdb.bytes.read As well as histogram rocksdb.bytes.per.read and perf context get_read_bytes This PR will add the necessary counter reporting logic in the Get() call path Pull Request resolved: https://github.com/facebook/rocksdb/pull/4260 Differential Revision: D9476431 Pulled By: miasantreble fbshipit-source-id: 7ab409d4e59df05d09ae8b69fe75554e5aa240d6
-
由 Andrew Kryczka 提交于
Summary: ZSTD's dynamic library exports `ZDICT_trainFromBuffer` symbol since v1.1.3, and its static library exports it since v0.6.1. We don't know whether linkage is static or dynamic, so just require v1.1.3 to use dictionary trainer. Fixes the issue reported here: https://jira.mariadb.org/browse/MDEV-16525. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4295 Differential Revision: D9417183 Pulled By: ajkr fbshipit-source-id: 0e89d2f48d9e7f6eee73e7f4572660a9f7122db8
-
- 22 8月, 2018 4 次提交
-
-
由 Fenggang Wu 提交于
Summary: Improve the description of the backward compatibility check in NumRestarts() Pull Request resolved: https://github.com/facebook/rocksdb/pull/4286 Differential Revision: D9412490 Pulled By: fgwu fbshipit-source-id: ea7dd5c61d8ff8eacef623b729d4e4fd53cca066
-
由 Yi Wu 提交于
Summary: Suppress multiple clang-analyzer error. All of them are clang false-positive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4299 Differential Revision: D9430740 Pulled By: yiwu-arbug fbshipit-source-id: fbdd575bdc214d124826d61d35a117995c509279
-
由 Anand Ananthabhotla 提交于
Summary: Update HISTORY.md for 5.16. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4298 Differential Revision: D9433868 Pulled By: anand1976 fbshipit-source-id: e7880a1c952210b1e9d7466eed72a6cb5018096b
-
由 Zhichao Cao 提交于
Summary: Previously, the trace_analyzer_tool will be complied with other libobjects, which let the GFLAGS of trace_analyzer appear in other tools (e.g., db_bench, rocksdb_dump, and etc.). When using '--help', the help information of trace_analyzer will appear in other tool help information, which will cause confusion issues. Currently, trace_analyzer_tool is built and used only by trace_analyzer and trace_analyzer_test to avoid the issues. Tested with make asan_check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4290 Differential Revision: D9413163 Pulled By: zhichao-cao fbshipit-source-id: ed5d20c4575a53ca15ff62a2ffe601d5cf278cc4
-
- 21 8月, 2018 5 次提交
-
-
由 Fenggang Wu 提交于
Summary: `BlockBasedTableBuilder::Add()` eventually calls `DataBlockHashIndexBuilder::EstimateSize()`. The previous implementation divides the `num_keys` by the `util_ratio_` to get an estimizted `num_buckets`. Such division is expensive as it happens in every `BlockBasedTableBuilder::Add()`. This diff estimates the `num_buckets` by double addition instead of double division. Specifically, in each `Add()`, we add `bucket_per_key_` (inverse of `util_ratio_`) to the current `estimiated_num_buckets_`. The cost is that we are gonna have the `estimated_num_buckets_` stored as one extra field in the DataBlockHashIndexBuilder. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4293 Differential Revision: D9412472 Pulled By: fgwu fbshipit-source-id: 2925c8509a401e7bd3c1ab1d9e9c7244755c277a
-
由 Yi Wu 提交于
Summary: Fix expired file not being evicted from the DB. We have a background task (previously called `CheckSeqFiles` and I rename it to `EvictExpiredFiles`) to scan and remove expired files, but it only close the files, not marking them as expired. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4294 Differential Revision: D9415984 Pulled By: yiwu-arbug fbshipit-source-id: eff7bf0331c52a7ccdb02318602bff7f64f3ef3d
-
由 Siying Dong 提交于
Summary: Clang analyze is not happy in two pieces of code, with "Potential memory leak". No idea what the problem but slightly changing the code makes clang happy. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4292 Differential Revision: D9413555 Pulled By: siying fbshipit-source-id: 9428c9d3664530c72129feefd135ee63d8386137
-
由 Siying Dong 提交于
Summary: Suppress two CLANG analyze warnings. They don't seem to be real bugs Pull Request resolved: https://github.com/facebook/rocksdb/pull/4291 Differential Revision: D9407333 Pulled By: siying fbshipit-source-id: 2ed63d88fa0b217fdccb1572d7508467c2203dc8
-
由 Yanqin Jin 提交于
Summary: During recovery, RocksDB is able to handle version edits that belong to group commits. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752) Pull Request resolved: https://github.com/facebook/rocksdb/pull/3945 Differential Revision: D8529122 Pulled By: riversand963 fbshipit-source-id: 57cb0f9cc55ecca684a837742d6626dc9c07f37e
-
- 18 8月, 2018 3 次提交
-
-
由 jsteemann 提交于
Summary: Transaction has had methods to deal with SavePoints already, but was missing the PopSavePoint method provided by WriteBatch and WriteBatchWithIndex. This PR adds PopSavePoint to Transaction as well. Having the method on Transaction-level too is useful for applications that repeatedly execute a sequence of operations that normally succeed, but infrequently need to get rolled back. Using SavePoints here is sensible, but as operations normally succeed the application may pile up a lot of useless SavePoints inside a Transaction, leading to slightly increased memory usage for managing the unneeded SavePoints. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4256 Differential Revision: D9326932 Pulled By: yiwu-arbug fbshipit-source-id: 53a0af18a6c7e87feff8a56f1f3eab9df7f371d6
-
由 Christian Esken 提交于
Summary: Closes https://github.com/facebook/rocksdb/issues/4195 CompactRangeOptions are available the CPP API, but not in the Java API. This PR adds CompactRangeOptions to the Java API and adds an overloaded compactRange() method. See https://github.com/facebook/rocksdb/issues/4195 for the original discussion. This change supports all fields of CompactRangeOptions, including the required enum converters in the JNI portal. Significant changes: - Make CompactRangeOptions available in the compactRange() for Java. - Deprecate other compactRange() methods that have individual option params, like in the CPP code. - Migrate rocksdb_compactrange_helper() to CompactRangeOptions. - Add Java unit tests for CompactRangeOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4220 Differential Revision: D9380007 Pulled By: sagar0 fbshipit-source-id: 6af6c334f221427f1997b33fb24c3986b092fed6
-
由 Andrey Zagrebin 提交于
Summary: Followup for #4266. There is one more place in **get_context.cc** where **MergeOperator::ShouldMerge** should be called with reversed list of operands. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4284 Differential Revision: D9380008 Pulled By: sagar0 fbshipit-source-id: 70ec26e607e5b88465e1acbdcd6c6171bd76b9f2
-
- 17 8月, 2018 4 次提交
-
-
由 Fenggang Wu 提交于
Summary: Add `--data_block_index_type` and `--data_block_hash_table_util_ratio` option to `db_bench`. `--data_block_index_type` can be either of `binary` (default) or `binary_and_hash`; `--data_block_hash_table_util_ratio` will be a double. The default value is `0.75`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4281 Differential Revision: D9361476 Pulled By: fgwu fbshipit-source-id: dc53e01acef9db81b9eec5e8a96f3bc8ed718c10
-
由 Mikhail Antonov 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4275 Reviewed By: yiwu-arbug Differential Revision: D9369766 Pulled By: mikhail-antonov fbshipit-source-id: d91b64c34cc1976b324a260767fce343fa32afde
-
由 Siying Dong 提交于
Summary: Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271 Differential Revision: D9369149 Pulled By: siying fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf
-
由 Andrey Zagrebin 提交于
Summary: This PR addresses issue #3865 and implements the following approach to fix it: - adds `MergeContext::GetOperandsDirectionForward` and `MergeContext::GetOperandsDirectionBackward` to query merge operands in a specific order - `MergeContext::GetOperands` becomes a shortcut for `MergeContext::GetOperandsDirectionForward` - pass `MergeContext::GetOperandsDirectionBackward` to `MergeOperator::ShouldMerge` and document the order Pull Request resolved: https://github.com/facebook/rocksdb/pull/4266 Differential Revision: D9360750 Pulled By: sagar0 fbshipit-source-id: 20cb73ff017760b062ecdcf4382560767086e092
-
- 16 8月, 2018 1 次提交
-
-
由 Fenggang Wu 提交于
Summary: Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash.` The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4174 Differential Revision: D8965914 Pulled By: fgwu fbshipit-source-id: 1c6bae5d1fc39c80282d8890a72e9e67bc247198
-
- 15 8月, 2018 4 次提交
-
-
由 Zhichao Cao 提交于
Summary: The wrong options are used in the trace_analyzer_test, removed. The potential loses integer precision are fixed. Pass the specified testing case, make asan_check Pull Request resolved: https://github.com/facebook/rocksdb/pull/4274 Reviewed By: yiwu-arbug Differential Revision: D9327811 Pulled By: zhichao-cao fbshipit-source-id: d62cb18d6586503a490cd323bfc1c672b68b346e
-
由 jsteemann 提交于
Summary: Fixes compilation warnings (which are turned into compilation errors by default) when compiling with g++ option `-Wsuggest-override`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4272 Differential Revision: D9322556 Pulled By: siying fbshipit-source-id: abd57a29ec8f544bee77c0bb438f31be830b7244
-
由 Anand Ananthabhotla 提交于
Summary: In the OnTableFileCreation() listener, assert on various TableProperties only when file size > 0 bytes. The listener can get called even for 0 byte SSTs which have been deleted. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4273 Differential Revision: D9322738 Pulled By: anand1976 fbshipit-source-id: 17cdfb3d0da946b9a158d7328e5db1c87973956b
-
由 Maysam Yabandeh 提交于
Summary: Stress tests currently cover format_version 2 and 3. The patch adds 4 as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4265 Differential Revision: D9323185 Pulled By: maysamyabandeh fbshipit-source-id: 54d11e41ecae09bae14cadd7313f07c9a3db5a57
-
- 14 8月, 2018 4 次提交
-
-
由 Huachao Huang 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4267 Differential Revision: D9309505 Pulled By: anand1976 fbshipit-source-id: eb9fee8037f4ff24dc1cdd5cc5ef41c231a03e1f
-
由 Siying Dong 提交于
Summary: Add a unit test to check that iterators release data blocks after it has moved away from it. Verify the same for compaction input iterators. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4170 Differential Revision: D8962513 Pulled By: siying fbshipit-source-id: 05a5b604d7d29887fb488f2cda7286f554a14407
-
由 Zhichao Cao 提交于
Summary: A framework of trace analyzing for RocksDB After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload. **Input:** 1. trace file 2. Whole keys space file **Statistics:** 1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family. 2. Key hotness (access count) of each one 3. Key space separation based on given prefix 4. Key size distribution 5. Value size distribution if appliable 6. Top K accessed keys 7. QPS statistics including the average QPS and peak QPS 8. Top K accessed prefix 9. The query correlation analyzing, output the number of X after Y and the corresponding average time intervals **Output:** 1. key access heat map (either in the accessed key space or whole key space) 2. trace sequence file (interpret the raw trace file to line base text file for future use) 3. Time serial (The key space ID and its access time) 4. Key access count distritbution 5. Key size distribution 6. Value size distribution (in each intervals) 7. whole key space separation by the prefix 8. Accessed key space separation by the prefix 9. QPS of each operation and each column family 10. Top K QPS and their accessed prefix range **Test:** 1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge 2. Generated the trace and analyze the trace **Implemented but not tested (due to the limitation of trace_replay):** 1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing 2. Analyzing the number of Key found by Get **Future Work:** 1. Support execution time analyzing of each requests 2. Support cache hit situation and block read situation of Get Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091 Differential Revision: D9256157 Pulled By: zhichao-cao fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
-
由 Yanqin Jin 提交于
Summary: Due to 4ea56b1b, we should also remove the assersion in stress test. This removal can be temporary, and we can add it back once we figure out the reason for the 0-byte SSTs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4268 Differential Revision: D9297186 Pulled By: riversand963 fbshipit-source-id: cebba9a68f42e815f8cf24471176d2cfdf962f63
-
- 12 8月, 2018 1 次提交
-
-
由 Anand Ananthabhotla 提交于
Summary: Revert this change. Not generating the OnTableFileCreated() notification for a 0 byte SST on flush breaks the assumption that every OnTableFileCreationStarted() notification is followed by a corresponding OnTableFileCreated(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4263 Differential Revision: D9285623 Pulled By: anand1976 fbshipit-source-id: 808c3dcd498b4b4f4ed4be947a29a24b2296aa8d
-