提交 · 727eb881a56d75299cb180199a74673041c8d3c8 · kvdb / rocksdb

31 5月, 2018 2 次提交

Compile error in db bench tool · 727eb881

由 Jacquin Mininger 提交于 5月 30, 2018

Summary:
Small format error below causes build to fail. I believe that this :
```
fprintf(stderr, "num reads to do %lu\n", reads_);
```
Can be changed to this:
```
fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_);
```
Successful build
```
  CC       utilities/blob_db/blob_dump_tool.o
  AR       librocksdb_debug.a
ar: creating archive librocksdb_debug.a
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols
  CC       tools/db_bench.o
  CC       tools/db_bench_tool.o
tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat]
    fprintf(stderr, "num reads to do %lu\n", reads_);
                                     ~~~     ^~~~~~
                                     %lld
1 error generated.
make: *** [tools/db_bench_tool.o] Error 1
```

```
$ cd rocksdb
$ make all

$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```
Closes https://github.com/facebook/rocksdb/pull/3909

Differential Revision: D8215710

Pulled By: siying

fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751

727eb881

Remove tests from ROCKSDB_VALGRIND_RUN · 4dd80deb

由 Siying Dong 提交于 5月 30, 2018

Summary:
In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind.
Closes https://github.com/facebook/rocksdb/pull/3924

Differential Revision: D8210184

Pulled By: siying

fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a

4dd80deb

30 5月, 2018 4 次提交

Delete triggered compaction for universal style · a736255d

由 Anand Ananthabhotla 提交于 5月 29, 2018

Summary:
This is still WIP, but I'm hoping for early feedback on the overall approach.

This patch implements deletion triggered compaction, which till now only
worked for leveled, for universal style. SST files are marked for
compaction by the CompactOnDeletionCollertor table property. This is
expected to be used when free disk space is low and the user wants to
reclaim space by deleting a bunch of keys. The deletions are expected to
be dense. In such a situation, we want to avoid a full compaction due to
its space overhead.

The strategy used in this case is similar to leveled. We pick one file
from the set of files marked for compaction. We then expand the inputs
to a clean cut on the same level, and then pick overlapping files from
the next non-mepty level. Picking files from the next level can cause
the key range to expand, and we opportunistically expand inputs in the
source level to include files wholly in this key range.

The main side effect of this is that it breaks the property of no time
range overlap between levels. This shouldn't break any functionality.
Closes https://github.com/facebook/rocksdb/pull/3860

Differential Revision: D8124397

Pulled By: anand1976

fbshipit-source-id: bfa2a9dd6817930e991b35d3a8e7e61304ed3dcf

a736255d

Fix LRUCache missing null check on destruct · 724855c7

由 Yi Wu 提交于 5月 29, 2018

Summary:
Fix LRUCache missing null check on destruct. The check is needed if LRUCache::DisownData is called.
Closes https://github.com/facebook/rocksdb/pull/3920

Differential Revision: D8191631

Pulled By: yiwu-arbug

fbshipit-source-id: d5014f6e49b51692c18a25fb55ece935f5a023c4

724855c7

Fix compilation error when OPT="-DROCKSDB_LITE". · cf826de3

由 Yanqin Jin 提交于 5月 29, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3917

Differential Revision: D8187733

Pulled By: riversand963

fbshipit-source-id: e4aa179cd0791ca77167e357f99de9afd4aef910

cf826de3

Check for rep_->table_properties being nullptr · 03cda531

由 Maysam Yabandeh 提交于 5月 29, 2018

Summary:
The very old sst formats do not have table_properties and rep_->table_properties is thus nullptr. The recent patch in https://github.com/facebook/rocksdb/pull/3894 does not check for nullptr and hence makes it backward incompatible. This patch adds the check.
Closes https://github.com/facebook/rocksdb/pull/3918

Differential Revision: D8188638

Pulled By: maysamyabandeh

fbshipit-source-id: b1d986665ecf0b4d1c442adfa8a193b97707d47b

03cda531

29 5月, 2018 1 次提交

奏

Fix VersionStorageInfo::EstimateLiveDataSize seg fault · 1c1bafa6

由奏之章提交于 5月 28, 2018

Summary:
`HandleEstimateLiveDataSize`'s `need_out_of_mutex` is true
https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L412-L413
so , is will ref a `SuperVersion`
https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/db_impl.cc#L1896-L1908
so , the param `version` of `InternalStats::HandleEstimateLiveDataSize` is safe , but `cfd_->current()` is not safe !
https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L790-L795

the `cfd_->current()` maybe invalid ...

here's mongo-rocks crash backtrace
```
 mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x41) [0x7fe3a3137c51]
 mongod(+0x2152E89) [0x7fe3a3136e89]
 mongod(+0x21534F6) [0x7fe3a31374f6]
 libpthread.so.0(+0xF5E0) [0x7fe39f5e45e0]
 mongod(rocksdb::InternalKeyComparator::Compare(rocksdb::Slice const&, rocksdb::Slice const&) const+0x17) [0x7fe3a22375a7]
 mongod(rocksdb::VersionStorageInfo::EstimateLiveDataSize() const+0x3AA) [0x7fe3a228daba]
 mongod(rocksdb::InternalStats::HandleEstimateLiveDataSize(unsigned long*, rocksdb::DBImpl*, rocksdb::Version*)+0x20) [0x7fe3a2250d70]
 mongod(rocksdb::DBImpl::GetIntPropertyInternal(rocksdb::ColumnFamilyData*, rocksdb::DBPropertyInfo const&, bool, unsigned long*)+0xEF) [0x7fe3a21e3dbf]
```
Closes https://github.com/facebook/rocksdb/pull/3912

Differential Revision: D8179944

Pulled By: yiwu-arbug

fbshipit-source-id: 26f314a8f98f4c2dc4348745d759f26f0e8d95e1

1c1bafa6

26 5月, 2018 5 次提交

Exclude seq from index keys · 402b7aa0

由 Maysam Yabandeh 提交于 5月 25, 2018

Summary:
Index blocks have the same format as data blocks. The keys therefore similarly to the keys in the data blocks are internal keys, which means that in addition to the user key it also has 8 bytes that encodes sequence number and value type. This extra 8 bytes however is not necessary in index blocks since the index keys act as an separator between two data blocks. The only exception is when the last key of a block and the first key of the next block share the same user key, in which the sequence number is required to act as a separator.
The patch excludes the sequence from index keys only if the above special case does not happen for any of the index keys. It then records that in the property block. The reader looks at the property block to see if it should expect sequence numbers in the keys of the index block.s
Closes https://github.com/facebook/rocksdb/pull/3894

Differential Revision: D8118775

Pulled By: maysamyabandeh

fbshipit-source-id: 915479f028b5799ca91671d67455ecdefbd873bd

402b7aa0

Check status when reading HashIndexPrefixesMetadataBlock · 8c3bf080

由 Nathan VanBenschoten 提交于 5月 25, 2018

Summary:
This was missed in a refactor of `ReadBlockContents` (2f1a3a4d).
Closes https://github.com/facebook/rocksdb/pull/3906

Differential Revision: D8172648

Pulled By: ajkr

fbshipit-source-id: 27e453b19795fea974bfed4721105be6f3a12090

8c3bf080

Fix an issue with unnecessary capture in lambda expressions · 45434178

由 Adam Retter 提交于 5月 25, 2018

Summary:
Closes https://github.com/facebook/rocksdb/issues/3900
Replaces https://github.com/facebook/rocksdb/pull/3901

I needed this to build v5.12.4 on Mac OS X (10.13.3).
Closes https://github.com/facebook/rocksdb/pull/3904

Differential Revision: D8169357

Pulled By: sagar0

fbshipit-source-id: 85faac42168796e7def9250d0c221a9a03b84476

45434178

Fix segfault caused by object premature destruction · aa53579d

由 Yanqin Jin 提交于 5月 25, 2018

Summary:
Please refer to earlier discussion in [issue 3609](https://github.com/facebook/rocksdb/issues/3609).
There was also an alternative fix in [PR 3888](https://github.com/facebook/rocksdb/pull/3888), but the proposed solution requires complex change.

To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed.
Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised.
Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion.

Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler.

Test plan
```
$ make -j16
$ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion
```

Expected behavior:
All tests should pass.
Closes https://github.com/facebook/rocksdb/pull/3898

Differential Revision: D8149421

Pulled By: riversand963

fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438

aa53579d

奏

Fix Fadvise on closed file when reads use mmap · 6e08916e

由奏之章提交于 5月 25, 2018

Summary:
```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called.
Closes https://github.com/facebook/rocksdb/pull/2764

Differential Revision: D8152515

Pulled By: ajkr

fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65

6e08916e

25 5月, 2018 4 次提交

add flush_before_backup parameter to c api rocksdb_backup_engine_create_new_backup · 070319f7

由 QingpingWang 提交于 5月 24, 2018

Summary:
Add flush_before_backup to rocksdb_backup_engine_create_new_backup. make c api able to control the flush before backup behavior.
Closes https://github.com/facebook/rocksdb/pull/3897

Differential Revision: D8157676

Pulled By: ajkr

fbshipit-source-id: 88998c62f89f087bf8672398fd7ddafabbada505

070319f7

LRUCache midpoint insertion · bc7e8d47

由 Yi Wu 提交于 5月 24, 2018

Summary:
Implement midpoint insertion strategy where new blocks will be insert to the middle of LRU list, then move the head on the first hit in cache.
Closes https://github.com/facebook/rocksdb/pull/3877

Differential Revision: D8100895

Pulled By: yiwu-arbug

fbshipit-source-id: f4bd83cb8be469e5d02072cfc8bd66011391f3da

bc7e8d47

Catchup with posix features · 3db8504c

由 Dmitri Smirnov 提交于 5月 24, 2018

Summary:
Catch up with Posix features
  NewWritableRWFile must fail when file does not exists
  Implement Env::Truncate()
  Adjust Env options optimization functions
  Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857

Differential Revision: D8053610

Pulled By: ajkr

fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e

3db8504c

port_posix: use posix_memalign() for aligned_alloc · c4655093

由 Kefu Chai 提交于 5月 24, 2018

Summary:
to workaround issue of http://tracker.ceph.com/issues/21422 .
and in tcmalloc aligned_alloc and posix_memalign() are basically the
same thing. the same applies to GNU glibc.

fixes #3175
Signed-off-by: NKefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/3862

Differential Revision: D8147930

Pulled By: yiwu-arbug

fbshipit-source-id: 355afe93c4dd0a96a0d711ef190e8b86fbe8d11d

c4655093

24 5月, 2018 3 次提交

refactor constructor of LRUCacheShard · 7a99c043

由 Yi Wu 提交于 5月 23, 2018

Summary:
Update LRUCacheShard constructor so that adding new params to it don't need to add extra SetXXX() methods.
Closes https://github.com/facebook/rocksdb/pull/3896

Differential Revision: D8128618

Pulled By: yiwu-arbug

fbshipit-source-id: 6afa715de1493a50de413678761a765e3af9b83b

7a99c043

Introduce library-independent default compression level · 01bcc348

由 Andrew Kryczka 提交于 5月 23, 2018

Summary:
Previously we were using -1 as the default for every library, which was legacy from our zlib options. That worked for a while, but after zstd introduced https://github.com/facebook/zstd/commit/a146ee04ae5866b948be0c1911418e0436d80cb4, it started giving poor compression ratios by default in zstd.

This PR adds a constant to RocksDB public API, `CompressionOptions::kDefaultCompressionLevel`, which will get translated to the default value specific to the compression library being used in "util/compression.h". The constant uses a number that appears to be larger than any library's maximum compression level.
Closes https://github.com/facebook/rocksdb/pull/3895

Differential Revision: D8125780

Pulled By: ajkr

fbshipit-source-id: 2db157a89118cd4f94577c2f4a0a5ff31c8391c6

01bcc348

Specify the underlying type of enums. · 4011012d

由 Yanqin Jin 提交于 5月 23, 2018

Summary:
Explicitly specify the underlying type of enums help developers understand the physical storage.
Closes https://github.com/facebook/rocksdb/pull/3892

Differential Revision: D8107027

Pulled By: riversand963

fbshipit-source-id: a00efecbba46df4a3c8eed0994a2d4972ad1a1d3

4011012d

23 5月, 2018 4 次提交

Fix a backward compatibility problem with table_properties being nullptr · 6c73a466

由 Zhongyi Xie 提交于 5月 22, 2018

Summary:
Currently when ldb built from master tries to open a DB from version 2.2, there will be a segfault because table_properties didn't exist back then.
Closes https://github.com/facebook/rocksdb/pull/3890

Differential Revision: D8100914

Pulled By: miasantreble

fbshipit-source-id: b255e8aedc54695432be2e704839c857dabdd65a

6c73a466

Fix Issue #3771: Slice ctor checks for nullptr and creates empty string · 4420cb49

由 Jacquin Mininger 提交于 5月 22, 2018

Summary:
Fix Issue #3771   : Check for nullptr in Slice constructor
Slice ctor checks for nullptr and creates empty string if the string does not exist
Closes https://github.com/facebook/rocksdb/pull/3887

Differential Revision: D8098852

Pulled By: ajkr

fbshipit-source-id: 04471077defa9776ce7b8c389a61312ce31002fb

4420cb49

Avoid sleep in DBTest.GroupCommitTest to fix flakiness · 7db721b9

由 Andrew Kryczka 提交于 5月 22, 2018

Summary:
DBTest.GroupCommitTest would often fail when run under valgrind because its sleeps were insufficient to guarantee a group commit had multiple entries. Instead we can use sync point to force a leader to wait until a non-leader thread has enqueued its work, thus guaranteeing a leader can do group commit work for multiple threads.
Closes https://github.com/facebook/rocksdb/pull/3883

Differential Revision: D8079429

Pulled By: ajkr

fbshipit-source-id: 61dc50fad29d2c85547842f681288de60fa29049

7db721b9

Avoid single-deleting merge operands in db_stress · fcb31016

由 Andrew Kryczka 提交于 5月 22, 2018

Summary:
I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them.
Closes https://github.com/facebook/rocksdb/pull/3878

Differential Revision: D8097346

Pulled By: ajkr

fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1

fcb31016

22 5月, 2018 6 次提交

PersistRocksDBOptions() to use WritableFileWriter · 3db1ada3

由 Siying Dong 提交于 5月 21, 2018

Summary:
By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering.
Closes https://github.com/facebook/rocksdb/pull/3882

Differential Revision: D8080673

Pulled By: siying

fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b

3db1ada3

Move prefix_extractor to MutableCFOptions · c3ebc758

由 Zhongyi Xie 提交于 5月 21, 2018

Summary:
Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
This PR aims to make it possible to dynamically change bloom filter config.
Closes https://github.com/facebook/rocksdb/pull/3601

Differential Revision: D7253114

Pulled By: miasantreble

fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c

c3ebc758

Update ColumnFamilyTest for multi-CF verification · 263ef52b

由 Yanqin Jin 提交于 5月 21, 2018

Summary:
Change `keys_` from `set<string>` to `vector<set<string>>` so that each column
family's keys are stored in one set.

ajkr When you have a chance, can you PTAL? Thanks!
Closes https://github.com/facebook/rocksdb/pull/3871

Differential Revision: D8056447

Pulled By: riversand963

fbshipit-source-id: 650d0f9cad02b1bc005fc329ad76edbf053e6386

263ef52b

Print histogram count and sum in statistics string · 508a09fd

由 Andrew Kryczka 提交于 5月 21, 2018

Summary:
Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression.

This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best...
Closes https://github.com/facebook/rocksdb/pull/3863

Differential Revision: D8038831

Pulled By: ajkr

fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c

508a09fd

Assert keys/values pinned by range deletion meta-block iterators · 7b655214

由 Andrew Kryczka 提交于 5月 21, 2018

Summary:
`RangeDelAggregator` holds the pointers returned by `BlockIter::key()` and `BlockIter::value()` so requires the data to which they point is pinned. `BlockIter::key()` points into block memory and is guaranteed to be pinned if and only if prefix encoding is disabled (or, equivalently, restart interval is set to one). I think `BlockIter::value()` is always pinned. Added an assert for these and removed the wrong TODO about increasing restart interval, which would enable key prefix encoding and break the assertion.
Closes https://github.com/facebook/rocksdb/pull/3875

Differential Revision: D8063667

Pulled By: ajkr

fbshipit-source-id: 60b5ebcc0cdd610dd6aad9e74a23378793672c41

7b655214

Add missing test files to src.mk · e410501e

由 Andrew Kryczka 提交于 5月 21, 2018

Summary:
We only generate the header dependency (".cc.d") files for files mentioned in "src.mk". When we don't generate them, changes to header dependencies do not cause `make` to recompile the dependent ".o". Then it takes a while for developers (or maybe just me) to realize `make clean` is necessary.
Closes https://github.com/facebook/rocksdb/pull/3876

Differential Revision: D8065389

Pulled By: ajkr

fbshipit-source-id: 0f62eee7bcab15b0215791564e6ab3775d46996b

e410501e

19 5月, 2018 2 次提交

fix a division by zero bug · ed4d3393

由 Zhongyi Xie 提交于 5月 18, 2018

Summary:
fixes the failing clang_analyze contrun test
Closes https://github.com/facebook/rocksdb/pull/3872

Differential Revision: D8059241

Pulled By: miasantreble

fbshipit-source-id: e8fc1838004fe16a823456188386b8b39429803b

ed4d3393

class Block to store num_restarts_ · 26da3676

由 Siying Dong 提交于 5月 18, 2018

Summary:
Right now, every Block::NewIterator() reads num_restarts_ from the block, which is already read in Block::Block(). This sometimes cause a CPU cache miss. Although fetching this cacheline can usually benefit follow-up block restart offset reading, as they are close to each other, it's almost free to get ride of this read by storing it in the Block class.
Closes https://github.com/facebook/rocksdb/pull/3869

Differential Revision: D8052493

Pulled By: siying

fbshipit-source-id: 9c72360f0c2d7329f3c198ce4eaedd2bc14b87c1

26da3676

18 5月, 2018 6 次提交

Set the default value of max_manifest_file_size. · a0c7b4d5

由 Yanqin Jin 提交于 5月 18, 2018

Summary:
In the past, the default value of max_manifest_file_size is uint64_t::MAX,
allowing a long running RocksDB process to grow its MANIFEST file to take up
the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB.

siying miasantreble Please let me know whether this looks good to you. Thanks!
Closes https://github.com/facebook/rocksdb/pull/3867

Differential Revision: D8051524

Pulled By: riversand963

fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72

a0c7b4d5

Implement key shortening functions in ReverseBytewiseComparator · 17af09fc

由 Siying Dong 提交于 5月 17, 2018

Summary:
Right now ReverseBytewiseComparator::FindShortestSeparator() doesn't really shorten key, and ReverseBytewiseComparator::FindShortestSuccessor() seems to return wrong results. The code is confusing too as it uses BytewiseComparatorImpl::FindShortestSeparator() but the function actually won't do anything if the the first key is larger than the second.

Implement ReverseBytewiseComparator::FindShortestSeparator() and override ReverseBytewiseComparator::FindShortestSuccessor() to be empty.
Closes https://github.com/facebook/rocksdb/pull/3836

Differential Revision: D7959762

Pulled By: siying

fbshipit-source-id: 93acb621c16ce6f23e087ae4e19f7d84d1254683

17af09fc

add override to virtual functions · 1d7ca20f

由 Zhongyi Xie 提交于 5月 17, 2018

Summary:
this will fix the failing clang_check test
Closes https://github.com/facebook/rocksdb/pull/3868

Differential Revision: D8050880

Pulled By: miasantreble

fbshipit-source-id: 749932e2e4025f835c961c068d601e522a126da6

1d7ca20f

Reorder field based on esan data · aed7abbc

由 Xin Tong 提交于 5月 17, 2018

Summary:
Running. TEST_TMPDIR=/dev/shm ./buck-out/gen/rocks/tools/rocks_db_bench --benchmarks=readwhilewriting --num=5000000 -benchmark_write_rate_limit=2000000 --threads=32

Collected esan data and reorder field. Accesses to 4th and 6th fields take majority of the access.  Group them. Overall, this struct takes 10%+ of the total accesses in the program. (637773011/6107964986)

==2433831==  class rocksdb::InlineSkipList
==2433831==   size = 48, count = 637773011, ratio = 112412, array access = 0
==2433831==   # 0: offset = 0,   size = 2,       count = 455137, type = i16
==2433831==   # 1: offset = 2,   size = 2,       count = 6,      type = i16
==2433831==   # 2: offset = 4,   size = 4,       count = 182303, type = i32
==2433831==   # 3: offset = 8,   size = 8,       count = 263953900, type = %"class.rocksdb::MemTableRep::KeyComparator"*
==2433831==   # 4: offset = 16,  size = 8,       count = 136409, type = %"class.rocksdb::Allocator"*
==2433831==   # 5: offset = 24,  size = 8,       count = 366628820, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Node"*
==2433831==   # 6: offset = 32,  size = 4,       count = 6280031, type = %"struct.std::atomic" = type { %"struct.std::__atomic_base" }
==2433831==   # 7: offset = 40,  size = 8,       count = 136405, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Splice"*
==2433831==EfficiencySanitizer: total struct field access count = 6107964986

Before re-ordering
[trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
without-ro.log
readwhilewriting :       0.036 micros/op 27545605 ops/sec;   26.8 MB/s
(45954 of 5000000 found)
readwhilewriting :       0.036 micros/op 28024240 ops/sec;   27.2 MB/s
(43158 of 5000000 found)
readwhilewriting :       0.037 micros/op 27345145 ops/sec;   27.1 MB/s
(46725 of 5000000 found)
readwhilewriting :       0.037 micros/op 27072588 ops/sec;   27.3 MB/s
(42605 of 5000000 found)
readwhilewriting :       0.034 micros/op 29578781 ops/sec;   28.3 MB/s
(44294 of 5000000 found)
readwhilewriting :       0.035 micros/op 28528304 ops/sec;   27.7 MB/s
(44176 of 5000000 found)
readwhilewriting :       0.037 micros/op 27075497 ops/sec;   26.5 MB/s
(43763 of 5000000 found)
readwhilewriting :       0.036 micros/op 28024117 ops/sec;   27.1 MB/s
(40622 of 5000000 found)
readwhilewriting :       0.037 micros/op 27078709 ops/sec;   27.6 MB/s
(47774 of 5000000 found)
readwhilewriting :       0.034 micros/op 29020689 ops/sec;   28.1 MB/s
(45066 of 5000000 found)
AVERAGE()=27.37 MB/s

After re-ordering
[trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
ro.log
readwhilewriting :       0.036 micros/op 27542409 ops/sec;   27.7 MB/s
(46163 of 5000000 found)
readwhilewriting :       0.036 micros/op 28021148 ops/sec;   28.2 MB/s
(46155 of 5000000 found)
readwhilewriting :       0.036 micros/op 28021035 ops/sec;   27.3 MB/s
(44039 of 5000000 found)
readwhilewriting :       0.036 micros/op 27538659 ops/sec;   27.5 MB/s
(46781 of 5000000 found)
readwhilewriting :       0.036 micros/op 28028604 ops/sec;   27.6 MB/s
(44689 of 5000000 found)
readwhilewriting :       0.036 micros/op 27541452 ops/sec;   27.3 MB/s
(43156 of 5000000 found)
readwhilewriting :       0.034 micros/op 29041338 ops/sec;   28.8 MB/s
(44895 of 5000000 found)
readwhilewriting :       0.036 micros/op 27784974 ops/sec;   26.3 MB/s
(39963 of 5000000 found)
readwhilewriting :       0.036 micros/op 27538892 ops/sec;   28.1 MB/s
(46570 of 5000000 found)
readwhilewriting :       0.038 micros/op 26622473 ops/sec;   27.0 MB/s
(43236 of 5000000 found)
AVERAGE()=27.58 MB/s
Closes https://github.com/facebook/rocksdb/pull/3855

Reviewed By: siying

Differential Revision: D8048781

Pulled By: trentxintong

fbshipit-source-id: bc9807a9845e2a92cb171ce1ecb5a2c8a51f1481

aed7abbc

Update HISTORY and version for upcoming 5.14 · fa43948c

由 Fosco Marotto 提交于 5月 17, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3866

Differential Revision: D8043563

Pulled By: gfosco

fbshipit-source-id: da4af20e604534602ac0e07943135513fd9a9f53

fa43948c

In instrumented mutex, take timing once for both of perf_context and statistics · 7ccb35f6

由 Siying Dong 提交于 5月 17, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3427

Differential Revision: D6827236

Pulled By: siying

fbshipit-source-id: d8a2cc525c90df625510565669f2659014259a8a

7ccb35f6

17 5月, 2018 2 次提交

Change and clarify the relationship between Valid(), status() and Seek*() for... · 8bf555f4

由 Mike Kolupaev 提交于 5月 17, 2018

Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs

Summary:
Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
* If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
* When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.

However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).

This PR changes the convention to:
* If status() is not ok, Valid() always returns false.
* Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)

This does sacrifice the two use cases listed above, but siying said it's ok.

Overview of the changes:
* A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
* Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
* A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.

To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:

Iterators that didn't need changes:
* status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
* Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.

Iterators with changes (see inline comments for details):
* DBIter - an overhaul:
- It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
- It had a few code paths silently discarding subiterator's status. The stress test caught a few.
- The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
- Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
- It used to not reset status on seek for some types of errors.
- Some simplifications and better comments.
- Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
* MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
* ForwardIterator - changed to the new convention, also slightly simplified.
* ForwardLevelIterator - fixed some bugs and simplified.
* LevelIterator - simplified.
* TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
* BlockBasedTableIterator - minor changes.
* BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
* PlainTableIterator - some seeks used to not reset status.
* CuckooTableIterator - tiny code cleanup.
* ManagedIterator - fixed some bugs.
* BaseDeltaIterator - changed to the new convention and fixed a bug.
* BlobDBIterator - seeks used to not reset status.
* KeyConvertingIterator - some small change.
Closes https://github.com/facebook/rocksdb/pull/3810

Differential Revision: D7888019

Pulled By: al13n321

fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d

8bf555f4

Fix race condition between log_.erase and log_.back · 46fde6b6

由 Maysam Yabandeh 提交于 5月 16, 2018

Summary:
log_ contract specifies that it should not be modified unless both mutex_ and log_write_mutex_ are held. log_.erase however does that with only holding mutex_. This causes a race condition with two_write_queues since logs_.back is read with holding only log_write_mutex_ (which is correct according to logs_ contract) but logs_.erase is called concurrently. This is probably the cause of logs_.back returning nullptr in https://github.com/facebook/rocksdb/issues/3852 although I could not reproduce it.
Fixes https://github.com/facebook/rocksdb/issues/3852
Closes https://github.com/facebook/rocksdb/pull/3859

Differential Revision: D8026103

Pulled By: maysamyabandeh

fbshipit-source-id: ee394e00fe4aa520d884c5ef87981e9d6b5ccb28

46fde6b6

15 5月, 2018 1 次提交

Fix geo_db may seek an error key when they have the same quadkey · 42cb4775

由 acelyc111 提交于 5月 14, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3832

Differential Revision: D7994326

Pulled By: miasantreble

fbshipit-source-id: 84a81b35b97750360423a9d4eca5b5a14d002134

42cb4775

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功