提交 · 8c7eb59838086b4c368b3d39daa91dcec57ea55e · kvdb / rocksdb

27 4月, 2019 1 次提交

Fix ubsan failure in snapshot refresh (#5257) · 8c7eb598

由 Maysam Yabandeh 提交于 4月 26, 2019

Summary:
The newly added test CompactionJobTest.SnapshotRefresh sets the snapshot refresh period to 0 to stress the feature. This results into large number of refresh events, which in turn results into an UBSAN failure when a bitwise shift operand goes beyond the uint64_t size.
The patch fixes that by simplifying the shift logic to be done only by 2 bits after each refresh. Furthermore it verifies that the shift operation does not result in decreasing the refresh period.

Testing:
COMPILE_WITH_UBSAN=1 make -j32 compaction_job_test
./compaction_job_test --gtest_filter=CompactionJobTest.SnapshotRefresh
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5257

Differential Revision: D15106463

Pulled By: maysamyabandeh

fbshipit-source-id: f2718898ea7ba4fa9f7e87b70cf98fe647c0de80

8c7eb598

26 4月, 2019 4 次提交

Refresh snapshot list during long compactions (#5099) · 506e8448

由 Maysam Yabandeh 提交于 4月 25, 2019

Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5099

Differential Revision: D15086710

Pulled By: maysamyabandeh

fbshipit-source-id: 7649f56c3b6b2fb334962048150142a3bf9c1a12

506e8448

Option string/map/file can set env from object registry (#5237) · 6eb317bb

由 Andrew Kryczka 提交于 4月 25, 2019

Summary:
- By providing the "env" field in any text-based options (i.e., string, map, or file), we can use `NewCustomObject` to deserialize the text value into an actual `Env` object.
- Currently factory functions for `Env` registered with object registry should only return pointer to static `Env` objects. That's because `DBOptions::env` is a raw pointer so we cannot easily delegate cleanup.
- Note I did not add `env` to `db_option_type_info`. It wasn't needed for (de)serialization, and I believe we don't want to do verification on `env`, even by checking name. That's because the user should be able to copy their DB from Linux to Windows, change envs, and not see an option verification error.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5237

Differential Revision: D15056360

Pulled By: siying

fbshipit-source-id: 4b5f0b83297a5058f8949ec955dbf27d98d73d7e

6eb317bb

add missing rocksdb_flush_cf in c (#5243) · 084a3c69

由 niukuo 提交于 4月 25, 2019

Summary:
same to #5229
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5243

Differential Revision: D15082800

Pulled By: siying

fbshipit-source-id: f4a68a480db0e40e1ba7cf37e18b88e43dff7c08

084a3c69

Close WAL files before deletion (#5233) · da96f2fe

由 Yanqin Jin 提交于 4月 25, 2019

Summary:
Currently one thread in RocksDB keeps a WAL file open while another thread
deletes it. Although the first thread never writes to the WAL again, it still
tries to close it in the end. This is fine on POSIX, but can be problematic on
other platforms, e.g. HDFS, etc.. It will either cause a lot of warning messages or
throw exceptions. The solution is to let the second thread close the WAL before deleting it.

RocksDB keeps the writers of the logs to delete in `logs_to_free_`, which is passed to `job_context` during `FindObsoleteFiles` (holding mutex). Then in `PurgeObsoleteFiles` (without mutex), these writers should close the logs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5233

Differential Revision: D15032670

Pulled By: riversand963

fbshipit-source-id: c55e8a612db8cc2306644001a5e6d53842a8f754

da96f2fe

25 4月, 2019 3 次提交

update history.md (#5245) · 66d8360b

由 Zhongyi Xie 提交于 4月 24, 2019

Summary:
update history.md for `BottommostLevelCompaction::kForceOptimized` to mention possible user impact.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5245

Differential Revision: D15073712

Pulled By: miasantreble

fbshipit-source-id: d40f698c42e8a6368be4eac0a00d02279615edea

66d8360b

Don't call FindObsoleteFiles() in ~ColumnFamilyHandleImpl() if CF is not dropped (#5238) · cd77d3c5

由 Mike Kolupaev 提交于 4月 24, 2019

Summary:
We have a DB with ~4k column families and ~70k files. On shutdown, destroying the 4k ColumnFamilyHandle-s takes over 2 minutes. Most of this time is spent in VersionSet::AddLiveFiles() called from FindObsoleteFiles() from ~ColumnFamilyHandleImpl(). It's just iterating over the list of files in memory. This seems completely unnecessary as no obsolete files are actually found since the CFs are not even dropped. This PR fixes that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5238

Differential Revision: D15056342

Pulled By: siying

fbshipit-source-id: 2aa342ef3770b4aa384ce81f8768e485480e4f08

cd77d3c5

secondary instance: add support for WAL tailing on `OpenAsSecondary` · aa56b7e7

由 Zhongyi Xie 提交于 4月 24, 2019

Summary: PR https://github.com/facebook/rocksdb/pull/4899 implemented the general framework for RocksDB secondary instances. This PR adds the support for WAL tailing in `OpenAsSecondary`, which means after the `OpenAsSecondary` call, the secondary is now able to see primary's writes that are yet to be flushed. The secondary can see primary's writes in the WAL up to the moment of `OpenAsSecondary` call starts.

Differential Revision: D15059905

Pulled By: miasantreble

fbshipit-source-id: 44f71f548a30b38179a7940165e138f622de1f10

aa56b7e7

24 4月, 2019 4 次提交

Extend MultiGet batching to Transactions (#5210) · 1c8cbf31

由 anand76 提交于 4月 23, 2019

Summary:
MultiGet batching was implemented in #5011 in order to reduce CPU utilization when looking up multiple keys at once. This PR implements corresponding ```MultiGet``` and ```MultiGetSingleCFForUpdate``` in ```rocksdb::Transaction``` that call the underlying batching implementation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5210

Differential Revision: D15048164

Pulled By: anand1976

fbshipit-source-id: c52f6043102ab0cbc723f4cba2a7b7d1767f6f52

1c8cbf31

Print smallest and largest seqno in Version::DebugString() for more details (#5231) · a7d10319

由 qinzuoyan 提交于 4月 23, 2019

Summary:
In some cases, we want to known the smallest and largest sequence numbers of sstable files, to help us get more details.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5231

Differential Revision: D15038087

Pulled By: siying

fbshipit-source-id: c473c1ca07b53efe2f1884fa1ecdc8686f455ed8

a7d10319

Fix compilation on db_bench_tool.cc on Windows (#5227) · 990b2f4c

由 Adam Retter 提交于 4月 23, 2019

Summary:
I needed this change to be able to build the v6.0.1 release on Windows.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5227

Differential Revision: D15033815

Pulled By: sagar0

fbshipit-source-id: 579f3b8e694c34c0d43527eb2fa37175e37f5911

990b2f4c

DBIter to use IteratorWrapper for inner iterator (#5214) · 72c8533f

由 Siying Dong 提交于 4月 23, 2019

Summary:
It's hard to get DBIter to directly use InternalIterator::NextAndGetResult() because the code change would be complicated. Instead, use IteratorWrapper, where Next() is already using NextAndGetResult(). Performance number is hard to measure because it is small and ther is variation. I run readseq many times, and there seems to be 1% gain.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5214

Differential Revision: D15003635

Pulled By: siying

fbshipit-source-id: 17af1965c409c2fe90cd85037fbd2c5a1364f82a

72c8533f

23 4月, 2019 3 次提交

Fix compilation errors for 32bits/LITE/ios build. (#5220) · 78a6e07c

由 Yuchi Chen 提交于 4月 22, 2019

Summary:
When I build RocksDB for 32bits/LITE/iOS environment, some errors like the following.

`
table/block_based_table_reader.cc:971:44: error: implicit conversion loses integer precision: 'uint64_t'
      (aka 'unsigned long long') to 'size_t' (aka 'unsigned long') [-Werror,-Wshorten-64-to-32]
    size_t block_size = props_block_handle.size();
           ~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~^~~~~~

./util/file_reader_writer.h:177:8: error: private field 'env_' is not used [-Werror,-Wunused-private-field]
  Env* env_;
       ^
`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5220

Differential Revision: D15023481

Pulled By: siying

fbshipit-source-id: 1b5d121d3016f2b0a8a9a2cc1bd638479357f9f7

78a6e07c

Log file_creation_time table property (#5232) · 47fd5748

由 Sagar Vemuri 提交于 4月 22, 2019

Summary:
Log file_creation_time table property when a new table file is created.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5232

Differential Revision: D15033069

Pulled By: sagar0

fbshipit-source-id: aaac56a4c03a8f96c338cad1b0cdb7fbfb887647

47fd5748

Optionally wait on bytes_per_sync to smooth I/O (#5183) · 8272a6de

由 Andrew Kryczka 提交于 4月 22, 2019

Summary:
The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.

My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.

Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.

There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).

The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183

Differential Revision: D14953553

Pulled By: siying

fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9

8272a6de

22 4月, 2019 1 次提交

Add BlockBasedTableOptions::index_shortening (#5174) · df38c1ce

由 Mike Kolupaev 提交于 4月 22, 2019

Summary:
Introduce BlockBasedTableOptions::index_shortening to give users control on which key shortening techniques to be used in building index blocks. Before this patch, both separators and successor keys where shortened in indexes. With this patch, the default is set to kShortenSeparators to only shorten the separators. Since each index block has many separators and only one successor (last key), the change should not have negative impact on index block size. However it should prevent many unnecessary block loads where due to approximation introduced by shorted successor, seek would land us to the previous block and then fix it by moving to the next one.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5174

Differential Revision: D14884185

Pulled By: al13n321

fbshipit-source-id: 1b08bc8c03edcf09b6b8c16e9a7eea08ad4dd534

df38c1ce

20 4月, 2019 5 次提交

refactor SavePoints (#5192) · de769094

由 jsteemann 提交于 4月 19, 2019

Summary:
Savepoints are assumed to be used in a stack-wise fashion (only
the top element should be used), so they were stored by `WriteBatch`
in a member variable `save_points` using an std::stack.

Conceptually this is fine, but the implementation had a few issues:
- the `save_points_` instance variable was a plain pointer to a heap-
  allocated `SavePoints` struct. The destructor of `WriteBatch` simply
  deletes this pointer. However, the copy constructor of WriteBatch
  just copied that pointer, meaning that copying a WriteBatch with
  active savepoints will very likely have crashed before. Now a proper
  copy of the savepoints is made in the copy constructor, and not just
  a copy of the pointer
- `save_points_` was an std::stack, which defaults to `std::deque` for
  the underlying container. A deque is a bit over the top here, as we
  only need access to the most recent savepoint (i.e. stack.top()) but
  never any elements at the front. std::deque is rather expensive to
  initialize in common environments. For example, the STL implementation
  shipped with GNU g++ will perform a heap allocation of more than 500
  bytes to create an empty deque object. Although the `save_points_`
  container is created lazily by RocksDB, moving from a deque to a plain
  `std::vector` is much more memory-efficient. So `save_points_` is now
  a vector.
- `save_points_` was changed from a plain pointer to an `std::unique_ptr`,
  making ownership more explicit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5192

Differential Revision: D15024074

Pulled By: maysamyabandeh

fbshipit-source-id: 5b128786d3789cde94e46465c9e91badd07a25d7

de769094

Fix history to not include some features in 6.1 (#5224) · dc64c2f5

由 Sagar Vemuri 提交于 4月 19, 2019

Summary:
Fix HISTORY.md by removing a few items from 6.1.1 history as they did not make into the 6.1.fb branch.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5224

Differential Revision: D15017030

Pulled By: sagar0

fbshipit-source-id: 090724d326d29168952e06dc1a5090c03fdd739e

dc64c2f5

Force read existing data during db repair (#5209) · c77aab58

由 Yanqin Jin 提交于 4月 19, 2019

Summary:
Setting read_opts.total_order_seek achieves this, even with a different prefix
extractor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5209

Differential Revision: D14980388

Pulled By: riversand963

fbshipit-source-id: 16527989a3d6b3e3ae8241c894d011326429d66e

c77aab58

Remove a couple of non-public includes from public header file (#5219) · 5265c570

由 anand76 提交于 4月 19, 2019

Summary:
Cleanup a couple of stray includes left by #5011.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5219

Differential Revision: D15007244

Pulled By: anand1976

fbshipit-source-id: 15ca1d4f977b5b60e99df3bfb8fc3db217d19bdd

5265c570

Add some "inline" annotation to DBIter functions (#5217) · 7a73adda

由 Siying Dong 提交于 4月 19, 2019

Summary:
My compiler doesn't inline DBIter::Next() to arena wrapped iterator, even if it is a direct forward. Adding this annotation makes it inlined. It might not always work but inlinging this function to arena wrapped iterator always feels like the right decision.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5217

Differential Revision: D15004086

Pulled By: siying

fbshipit-source-id: a4cffd79c6fb092669a3a90633c9aa5e494f8a66

7a73adda

19 4月, 2019 7 次提交

Use creation_time or mtime when file_creation_time=0 (#5184) · efa94874

由 Sagar Vemuri 提交于 4月 18, 2019

Summary:
We found an issue in Periodic Compactions (introduced in #5166) where files were not being picked up for compactions as all the SST files created with older versions of RocksDB have `file_creation_time` as 0. (Note that `file_creation_time` is a new table property introduced in #5166).

To address this, Periodic compactions now fall back to looking at the `creation_time` table property or the file's modification time (as given by the Env) when `file_creation_time` table property is found to be 0.

Here how the file's modification time (and, in turn, the file age) is computed now:
1. Use `file_creation_time` table property if it is > 0.
1. If not, then use `creation_time` table property if it is > 0.
1. If not, then use file's mtime stat metadata given by the underlying Env.
Don't consider the file at all for compaction if the modification time cannot be correctly determined based on the above conditions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5184

Differential Revision: D14907795

Pulled By: sagar0

fbshipit-source-id: 4bb2f3631f9a3e04470c674a1d13544584e1e56c

efa94874

reorganize history.md to list unreleased changes separately · 3bdce20e

由 Zhongyi Xie 提交于 4月 18, 2019

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5216

Differential Revision: D15003749

Pulled By: miasantreble

fbshipit-source-id: a52c264e694cd7c55813be33ee22b4f3046b545a

3bdce20e

Make ReadRangeDelAggregator::ShouldDelete() more inline friendly (#5202) · d6862b3f

由 Siying Dong 提交于 4月 18, 2019

Summary:
Reorganize the code so that no function call into ReadRangeDelAggregator is needed if there is no tomb range stone.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5202

Differential Revision: D14968155

Pulled By: siying

fbshipit-source-id: 0bd61911293c7a27b4e1b8d57c66d0c4ad6a6a5f

d6862b3f

Some small code changes to improve Next() (#5200) · 01cfea66

由 Siying Dong 提交于 4月 18, 2019

Summary:
Several small changes for Next():
1. Reducing branching by always update local_stats_.next_count_++ even if statistics is null. This should be faster than a branching.
2. Replacing ResetInternalKeysSkippedCounter() in Next() because the valid_ check is not needed in this case.
3. iter_->Valid() should always be true for non merge case. Remove this check.
4. Adding an inline annotation. It ends up with not picked up by my compiler, but it shouldn't hurt.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5200

Differential Revision: D15000391

Pulled By: siying

fbshipit-source-id: be97f61c708968234fb8e5cf272b5c2ac07dc4dd

01cfea66

Introduce InternalIteratorBase::NextAndGetResult() (#5197) · 992dfc78

由 Siying Dong 提交于 4月 18, 2019

Summary:
In long scans, virtual function calls of Next(), Valid(), key() and value() are not trivial. By introducing NextAndGetResult(), Some of the Next(), Valid() and key() calls are consolidated into one virtual function call to reduce CPU.
Also did some inline tricks and add some "final" randomly in some functions. Even without the "final" annotation, most Next() calls are inlined with -O3, but sometimes with a final it is inlined by O2 too. It doesn't hurt to add those final annotations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5197

Differential Revision: D14945977

Pulled By: siying

fbshipit-source-id: 7003969f9a5f1d5717f0bda503b91d19ba75ed88

992dfc78

Add copyright headers per FB open-source checkup tool. (#5199) · 6c2bf9e9

由 Fosco Marotto 提交于 4月 18, 2019

Summary:
internal task: T35568575
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199

Differential Revision: D14962794

Pulled By: gfosco

fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe

6c2bf9e9

Fix a bug in GetOverlappingInputsRangeBinarySearch (#5211) · 392f6d49

由 Yanqin Jin 提交于 4月 18, 2019

Summary:
As title.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5211

Differential Revision: D14992018

Pulled By: riversand963

fbshipit-source-id: b5720ea4742029e2fb47ff6d9f8d9de006db4ed4

392f6d49

18 4月, 2019 2 次提交

VersionSet: optmize GetOverlappingInputsRangeBinarySearch (#4987) · 5b7e09bd

由 JiYou 提交于 4月 17, 2019

Summary:
`GetOverlappingInputsRangeBinarySearch` firstly use binary search
to find a index in the given range `[begin, end]`. But after find
the index, then use linear search to find the `start_index` and
`end_index`. So the search process degraded to linear time.

Here optmize the search process with below changes:

- use `std::lower_bound` and `std::upper_bound` to get
  `lg(n)` search complexity.
- use uniformed lambda for search process.
- simplify process for `within_interval` true or false.
- remove function `ExtendFileRangeWithinInterval`
  and `ExtendFileRangeOverlappingInterval`.
Signed-off-by: NJiYou <jiyou09@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4987

Differential Revision: D14984192

Pulled By: riversand963

fbshipit-source-id: fae4b8e59a21b7e350718d60cdc94dd55ac81e89

5b7e09bd

rename variable to avoid shadowing (#5204) · 248b6b55

由 Zhongyi Xie 提交于 4月 17, 2019

Summary:
this PR fixes the following compile warning:
```
db/memtable.cc: In member function ‘virtual void rocksdb::MemTableIterator::Seek(const rocksdb::Slice&)’:
db/memtable.cc:321:22: error: declaration of ‘user_key’ shadows a member of 'this' [-Werror=shadow]
       Slice user_key(ExtractUserKey(k));
                      ^
db/memtable.cc: In member function ‘virtual void rocksdb::MemTableIterator::SeekForPrev(const rocksdb::Slice&)’:
db/memtable.cc:338:22: error: declaration of ‘user_key’ shadows a member of 'this' [-Werror=shadow]
       Slice user_key(ExtractUserKey(k));
                      ^
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5204

Differential Revision: D14970160

Pulled By: miasantreble

fbshipit-source-id: 388eb089f90c4528cc6d615dd4607fb53ceac705

248b6b55

17 4月, 2019 4 次提交

Avoid double-compacting data in bottom level in manual compactions (#5138) · baa53024

由 Zhongyi Xie 提交于 4月 16, 2019

Summary:
Depending on the config, manual compaction (leveled compaction style) does following compactions:
L0->L1
L1->L2
...
Ln-1 -> Ln
Ln -> Ln
The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only.
Resolves issue https://github.com/facebook/rocksdb/issues/4995
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138

Differential Revision: D14940106

Pulled By: miasantreble

fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5

baa53024

Add back NewEmptyIterator (#5203) · d9280ff2

由 Yanqin Jin 提交于 4月 16, 2019

Summary:
#4905 removed the implementation of `NewEmptyIterator` but kept its
declaration in the public header. This breaks some systems that depend on
RocksDB if the systems use `NewEmptyIterator`. Therefore, add it back to fix. cc maysamyabandeh please remind me if I miss anything here. Thanks
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5203

Differential Revision: D14968382

Pulled By: riversand963

fbshipit-source-id: 5fb86e99c8cfaf9f7a9473cdb1355d7558ff6e01

d9280ff2

WriteBufferManager's dummy entry size to block cache 1MB -> 256KB (#5175) · beb44ec3

由 Siying Dong 提交于 4月 16, 2019

Summary:
Dummy cache size of 1MB is too large for small block sizes. Our GetDefaultCacheShardBits() use min_shard_size = 512L * 1024L to determine number of shards, so 1MB will excceeds the size of the whole shard and make the cache excceeds the budget.
Change it to 256KB accordingly.
There shouldn't be obvious performance impact, since inserting a cache entry every 256KB of memtable inserts is still infrequently enough.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5175

Differential Revision: D14954289

Pulled By: siying

fbshipit-source-id: 2c275255c1ac3992174e06529e44c55538325c94

beb44ec3

Avoid per-key upper bound check in BlockBasedTableIterator (#5142) · f1239d5f

由 yiwu-arbug 提交于 4月 16, 2019

Summary:
This is second attempt for #5101. Original commit message:
`BlockBasedTableIterator` avoid reading next block on `Next()` if it detects the iterator will be out of bound, by checking against index key. The optimization was added in #2239, and by the time it only check the bound per block. It seems later change make it a per-key check, which introduce unnecessary key comparisons.

This patch come with two fixes:

Fix 1: To optimize checking for bounds, we need comparing the bounds with index key as well. However BlockBasedTableIterator doesn't know whether its index iterator is internally using user keys or internal keys. The patch fixes that by extending InternalIterator with a user_key() function that is overridden by In IndexBlockIter.

Fix 2: In #5101 we return `IsOutOfBound()=true` when block index key is out of bound. But the index key can be larger than smallest key of the next file on the level. That file can be within upper bound and should not be filtered out.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5142

Differential Revision: D14907113

Pulled By: siying

fbshipit-source-id: ac95775c5b4e7b700f76ab43e39f45402c98fbfb

f1239d5f

16 4月, 2019 5 次提交

Consolidating WAL creation which currently has duplicate logic in... · 71a82a0a

由 Vijay Nadimpalli 提交于 4月 15, 2019

Consolidating WAL creation which currently has duplicate logic in db_impl_write.cc and db_impl_open.cc (#5188)

Summary:
Right now, two separate pieces of code are used to create WAL files in DBImpl::Open function of db_impl_open.cc and DBImpl::SwitchMemtable function of db_impl_write.cc. This code change simply creates 1 function called DBImpl::CreateWAL in db_impl_open.cc which is used to replace existing WAL creation logic in DBImpl::Open and DBImpl::SwitchMemtable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5188

Differential Revision: D14942832

Pulled By: vjnadimpalli

fbshipit-source-id: d49230e04c36176015c8c1b422575872f92157fb

71a82a0a

Fix MultiGet ASSERT bug when passing unsorted result (#5195) · 3e63e553

由 Yi Zhang 提交于 4月 15, 2019

Summary:
Found this when test driving the new MultiGet. If you pass unsorted result with sorted_result = false you'll trigger the ASSERT incorrect even though we'll sort down below.

I've also added simple test cover sorted_result=true/false scenario copied from MultiGetSimple.

anand1976
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5195

Differential Revision: D14935475

Pulled By: yizhang82

fbshipit-source-id: 1d2af5e3a003847d965066a16e3b19da68acf170

3e63e553

db_bench: support seek to non-exist prefix (#5163) · b70967aa

由 Yi Wu 提交于 4月 15, 2019

Summary:
Add `--seek_missing_prefix` flag to db_bench to allow benchmarking seeking to non-existing prefix. Usage example:
```
./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --prefix_size=9 --keys_per_prefix=10
./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --num=100000000 --prefix_size=9 --keys_per_prefix=10 --reads=1000 --prefix_same_as_start=true --seek_missing_prefix=true
```
Also adding `--total_order_seek` and `--prefix_same_as_start` flags.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5163

Differential Revision: D14935724

Pulled By: riversand963

fbshipit-source-id: 7c41023f007febe373eb1589861f215432a9e18a

b70967aa

Update history and version to 6.1.1 (#5171) · b5cad5c9

由 Fosco Marotto 提交于 4月 15, 2019

Summary:
Including latest fixes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5171

Differential Revision: D14875157

Pulled By: gfosco

fbshipit-source-id: 86ec7ee3553a9b25ab71ed98966ce08a16322e2c

b5cad5c9

Improve transaction lock details (#5193) · 8295d364

由 jsteemann 提交于 4月 15, 2019

Summary:
This branch contains two small improvements:
* Create `LockMap` entries using `std::make_shared`. This saves one heap allocation per LockMap entry but also locates the control block and the LockMap object closely together in memory, which can help with caching
* Reorder the members of `TrackedTrxInfo`, so that the resulting struct uses less memory (at least on 64bit systems)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5193

Differential Revision: D14934536

Pulled By: maysamyabandeh

fbshipit-source-id: f7b49812bb4b6029eef9d131e7cd56260df5b28e

8295d364

13 4月, 2019 1 次提交

Add bounds check in FilePickerMultiGet::PrepareNextLevel() (#5189) · 29111e92

由 anand76 提交于 4月 12, 2019

Summary:
Add bounds check when looping through empty levels in FilePickerMultiGet
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5189

Differential Revision: D14925334

Pulled By: anand1976

fbshipit-source-id: 65d53247cf443153e28ce2b8b753fa51c6ae4566

29111e92

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功