1. 31 5月, 2018 2 次提交
    • J
      Compile error in db bench tool · 727eb881
      Jacquin Mininger 提交于
      Summary:
      Small format error below causes build to fail. I believe that this :
      ```
      fprintf(stderr, "num reads to do %lu\n", reads_);
      ```
      Can be changed to this:
      ```
      fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_);
      ```
      Successful build
      ```
        CC       utilities/blob_db/blob_dump_tool.o
        AR       librocksdb_debug.a
      ar: creating archive librocksdb_debug.a
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols
        CC       tools/db_bench.o
        CC       tools/db_bench_tool.o
      tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat]
          fprintf(stderr, "num reads to do %lu\n", reads_);
                                           ~~~     ^~~~~~
                                           %lld
      1 error generated.
      make: *** [tools/db_bench_tool.o] Error 1
      ```
      
      ```
      $ cd rocksdb
      $ make all
      
      $ g++ --version
      Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
      Apple LLVM version 9.1.0 (clang-902.0.39.1)
      Target: x86_64-apple-darwin17.5.0
      Thread model: posix
      InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
      ```
      Closes https://github.com/facebook/rocksdb/pull/3909
      
      Differential Revision: D8215710
      
      Pulled By: siying
      
      fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751
      727eb881
    • S
      Remove tests from ROCKSDB_VALGRIND_RUN · 4dd80deb
      Siying Dong 提交于
      Summary:
      In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind.
      Closes https://github.com/facebook/rocksdb/pull/3924
      
      Differential Revision: D8210184
      
      Pulled By: siying
      
      fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a
      4dd80deb
  2. 30 5月, 2018 4 次提交
    • A
      Delete triggered compaction for universal style · a736255d
      Anand Ananthabhotla 提交于
      Summary:
      This is still WIP, but I'm hoping for early feedback on the overall approach.
      
      This patch implements deletion triggered compaction, which till now only
      worked for leveled, for universal style. SST files are marked for
      compaction by the CompactOnDeletionCollertor table property. This is
      expected to be used when free disk space is low and the user wants to
      reclaim space by deleting a bunch of keys. The deletions are expected to
      be dense. In such a situation, we want to avoid a full compaction due to
      its space overhead.
      
      The strategy used in this case is similar to leveled. We pick one file
      from the set of files marked for compaction. We then expand the inputs
      to a clean cut on the same level, and then pick overlapping files from
      the next non-mepty level. Picking files from the next level can cause
      the key range to expand, and we opportunistically expand inputs in the
      source level to include files wholly in this key range.
      
      The main side effect of this is that it breaks the property of no time
      range overlap between levels. This shouldn't break any functionality.
      Closes https://github.com/facebook/rocksdb/pull/3860
      
      Differential Revision: D8124397
      
      Pulled By: anand1976
      
      fbshipit-source-id: bfa2a9dd6817930e991b35d3a8e7e61304ed3dcf
      a736255d
    • Y
      Fix LRUCache missing null check on destruct · 724855c7
      Yi Wu 提交于
      Summary:
      Fix LRUCache missing null check on destruct. The check is needed if LRUCache::DisownData is called.
      Closes https://github.com/facebook/rocksdb/pull/3920
      
      Differential Revision: D8191631
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: d5014f6e49b51692c18a25fb55ece935f5a023c4
      724855c7
    • Y
      Fix compilation error when OPT="-DROCKSDB_LITE". · cf826de3
      Yanqin Jin 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3917
      
      Differential Revision: D8187733
      
      Pulled By: riversand963
      
      fbshipit-source-id: e4aa179cd0791ca77167e357f99de9afd4aef910
      cf826de3
    • M
      Check for rep_->table_properties being nullptr · 03cda531
      Maysam Yabandeh 提交于
      Summary:
      The very old sst formats do not have table_properties and rep_->table_properties is thus nullptr. The recent patch in https://github.com/facebook/rocksdb/pull/3894 does not check for nullptr and hence makes it backward incompatible. This patch adds the check.
      Closes https://github.com/facebook/rocksdb/pull/3918
      
      Differential Revision: D8188638
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b1d986665ecf0b4d1c442adfa8a193b97707d47b
      03cda531
  3. 29 5月, 2018 1 次提交
  4. 26 5月, 2018 5 次提交
    • M
      Exclude seq from index keys · 402b7aa0
      Maysam Yabandeh 提交于
      Summary:
      Index blocks have the same format as data blocks. The keys therefore similarly to the keys in the data blocks are internal keys, which means that in addition to the user key it also has 8 bytes that encodes sequence number and value type. This extra 8 bytes however is not necessary in index blocks since the index keys act as an separator between two data blocks. The only exception is when the last key of a block and the first key of the next block share the same user key, in which the sequence number is required to act as a separator.
      The patch excludes the sequence from index keys only if the above special case does not happen for any of the index keys. It then records that in the property block. The reader looks at the property block to see if it should expect sequence numbers in the keys of the index block.s
      Closes https://github.com/facebook/rocksdb/pull/3894
      
      Differential Revision: D8118775
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 915479f028b5799ca91671d67455ecdefbd873bd
      402b7aa0
    • N
      Check status when reading HashIndexPrefixesMetadataBlock · 8c3bf080
      Nathan VanBenschoten 提交于
      Summary:
      This was missed in a refactor of `ReadBlockContents` (2f1a3a4d).
      Closes https://github.com/facebook/rocksdb/pull/3906
      
      Differential Revision: D8172648
      
      Pulled By: ajkr
      
      fbshipit-source-id: 27e453b19795fea974bfed4721105be6f3a12090
      8c3bf080
    • A
      Fix an issue with unnecessary capture in lambda expressions · 45434178
      Adam Retter 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/3900
      Replaces https://github.com/facebook/rocksdb/pull/3901
      
      I needed this to build v5.12.4 on Mac OS X (10.13.3).
      Closes https://github.com/facebook/rocksdb/pull/3904
      
      Differential Revision: D8169357
      
      Pulled By: sagar0
      
      fbshipit-source-id: 85faac42168796e7def9250d0c221a9a03b84476
      45434178
    • Y
      Fix segfault caused by object premature destruction · aa53579d
      Yanqin Jin 提交于
      Summary:
      Please refer to earlier discussion in [issue 3609](https://github.com/facebook/rocksdb/issues/3609).
      There was also an alternative fix in [PR 3888](https://github.com/facebook/rocksdb/pull/3888), but the proposed solution requires complex change.
      
      To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed.
      Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised.
      Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion.
      
      Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler.
      
      Test plan
      ```
      $ make -j16
      $ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion
      ```
      
      Expected behavior:
      All tests should pass.
      Closes https://github.com/facebook/rocksdb/pull/3898
      
      Differential Revision: D8149421
      
      Pulled By: riversand963
      
      fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438
      aa53579d
    • Fix Fadvise on closed file when reads use mmap · 6e08916e
      奏之章 提交于
      Summary:
      ```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called.
      Closes https://github.com/facebook/rocksdb/pull/2764
      
      Differential Revision: D8152515
      
      Pulled By: ajkr
      
      fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65
      6e08916e
  5. 25 5月, 2018 4 次提交
  6. 24 5月, 2018 3 次提交
  7. 23 5月, 2018 4 次提交
  8. 22 5月, 2018 6 次提交
    • S
      PersistRocksDBOptions() to use WritableFileWriter · 3db1ada3
      Siying Dong 提交于
      Summary:
      By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering.
      Closes https://github.com/facebook/rocksdb/pull/3882
      
      Differential Revision: D8080673
      
      Pulled By: siying
      
      fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b
      3db1ada3
    • Z
      Move prefix_extractor to MutableCFOptions · c3ebc758
      Zhongyi Xie 提交于
      Summary:
      Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
      This PR aims to make it possible to dynamically change bloom filter config.
      Closes https://github.com/facebook/rocksdb/pull/3601
      
      Differential Revision: D7253114
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
      c3ebc758
    • Y
      Update ColumnFamilyTest for multi-CF verification · 263ef52b
      Yanqin Jin 提交于
      Summary:
      Change `keys_` from `set<string>` to `vector<set<string>>` so that each column
      family's keys are stored in one set.
      
      ajkr When you have a chance, can you PTAL? Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3871
      
      Differential Revision: D8056447
      
      Pulled By: riversand963
      
      fbshipit-source-id: 650d0f9cad02b1bc005fc329ad76edbf053e6386
      263ef52b
    • A
      Print histogram count and sum in statistics string · 508a09fd
      Andrew Kryczka 提交于
      Summary:
      Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression.
      
      This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best...
      Closes https://github.com/facebook/rocksdb/pull/3863
      
      Differential Revision: D8038831
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c
      508a09fd
    • A
      Assert keys/values pinned by range deletion meta-block iterators · 7b655214
      Andrew Kryczka 提交于
      Summary:
      `RangeDelAggregator` holds the pointers returned by `BlockIter::key()` and `BlockIter::value()` so requires the data to which they point is pinned. `BlockIter::key()` points into block memory and is guaranteed to be pinned if and only if prefix encoding is disabled (or, equivalently, restart interval is set to one). I think `BlockIter::value()` is always pinned. Added an assert for these and removed the wrong TODO about increasing restart interval, which would enable key prefix encoding and break the assertion.
      Closes https://github.com/facebook/rocksdb/pull/3875
      
      Differential Revision: D8063667
      
      Pulled By: ajkr
      
      fbshipit-source-id: 60b5ebcc0cdd610dd6aad9e74a23378793672c41
      7b655214
    • A
      Add missing test files to src.mk · e410501e
      Andrew Kryczka 提交于
      Summary:
      We only generate the header dependency (".cc.d") files for files mentioned in "src.mk". When we don't generate them, changes to header dependencies do not cause `make` to recompile the dependent ".o". Then it takes a while for developers (or maybe just me) to realize `make clean` is necessary.
      Closes https://github.com/facebook/rocksdb/pull/3876
      
      Differential Revision: D8065389
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0f62eee7bcab15b0215791564e6ab3775d46996b
      e410501e
  9. 19 5月, 2018 2 次提交
    • Z
      fix a division by zero bug · ed4d3393
      Zhongyi Xie 提交于
      Summary:
      fixes the failing clang_analyze contrun test
      Closes https://github.com/facebook/rocksdb/pull/3872
      
      Differential Revision: D8059241
      
      Pulled By: miasantreble
      
      fbshipit-source-id: e8fc1838004fe16a823456188386b8b39429803b
      ed4d3393
    • S
      class Block to store num_restarts_ · 26da3676
      Siying Dong 提交于
      Summary:
      Right now, every Block::NewIterator() reads num_restarts_ from the block, which is already read in Block::Block(). This sometimes cause a CPU cache miss. Although fetching this cacheline can usually benefit follow-up block restart offset reading, as they are close to each other, it's almost free to get ride of this read by storing it in the Block class.
      Closes https://github.com/facebook/rocksdb/pull/3869
      
      Differential Revision: D8052493
      
      Pulled By: siying
      
      fbshipit-source-id: 9c72360f0c2d7329f3c198ce4eaedd2bc14b87c1
      26da3676
  10. 18 5月, 2018 6 次提交
    • Y
      Set the default value of max_manifest_file_size. · a0c7b4d5
      Yanqin Jin 提交于
      Summary:
      In the past, the default value of max_manifest_file_size is uint64_t::MAX,
      allowing a long running RocksDB process to grow its MANIFEST file to take up
      the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB.
      
      siying miasantreble Please let me know whether this looks good to you. Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3867
      
      Differential Revision: D8051524
      
      Pulled By: riversand963
      
      fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72
      a0c7b4d5
    • S
      Implement key shortening functions in ReverseBytewiseComparator · 17af09fc
      Siying Dong 提交于
      Summary:
      Right now ReverseBytewiseComparator::FindShortestSeparator() doesn't really shorten key, and ReverseBytewiseComparator::FindShortestSuccessor() seems to return wrong results. The code is confusing too as it uses BytewiseComparatorImpl::FindShortestSeparator() but the function actually won't do anything if the the first key is larger than the second.
      
      Implement ReverseBytewiseComparator::FindShortestSeparator() and override ReverseBytewiseComparator::FindShortestSuccessor() to be empty.
      Closes https://github.com/facebook/rocksdb/pull/3836
      
      Differential Revision: D7959762
      
      Pulled By: siying
      
      fbshipit-source-id: 93acb621c16ce6f23e087ae4e19f7d84d1254683
      17af09fc
    • Z
      add override to virtual functions · 1d7ca20f
      Zhongyi Xie 提交于
      Summary:
      this will fix the failing clang_check test
      Closes https://github.com/facebook/rocksdb/pull/3868
      
      Differential Revision: D8050880
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 749932e2e4025f835c961c068d601e522a126da6
      1d7ca20f
    • X
      Reorder field based on esan data · aed7abbc
      Xin Tong 提交于
      Summary:
      Running. TEST_TMPDIR=/dev/shm ./buck-out/gen/rocks/tools/rocks_db_bench --benchmarks=readwhilewriting --num=5000000 -benchmark_write_rate_limit=2000000 --threads=32
      
      Collected esan data and reorder field. Accesses to 4th and 6th fields take majority of the access.  Group them. Overall, this struct takes 10%+ of the total accesses in the program. (637773011/6107964986)
      
      ==2433831==  class rocksdb::InlineSkipList
      ==2433831==   size = 48, count = 637773011, ratio = 112412, array access = 0
      ==2433831==   # 0: offset = 0,   size = 2,       count = 455137, type = i16
      ==2433831==   # 1: offset = 2,   size = 2,       count = 6,      type = i16
      ==2433831==   # 2: offset = 4,   size = 4,       count = 182303, type = i32
      ==2433831==   # 3: offset = 8,   size = 8,       count = 263953900, type = %"class.rocksdb::MemTableRep::KeyComparator"*
      ==2433831==   # 4: offset = 16,  size = 8,       count = 136409, type = %"class.rocksdb::Allocator"*
      ==2433831==   # 5: offset = 24,  size = 8,       count = 366628820, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Node"*
      ==2433831==   # 6: offset = 32,  size = 4,       count = 6280031, type = %"struct.std::atomic" = type { %"struct.std::__atomic_base" }
      ==2433831==   # 7: offset = 40,  size = 8,       count = 136405, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Splice"*
      ==2433831==EfficiencySanitizer: total struct field access count = 6107964986
      
      Before re-ordering
      [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
      without-ro.log
      readwhilewriting :       0.036 micros/op 27545605 ops/sec;   26.8 MB/s
      (45954 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28024240 ops/sec;   27.2 MB/s
      (43158 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27345145 ops/sec;   27.1 MB/s
      (46725 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27072588 ops/sec;   27.3 MB/s
      (42605 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29578781 ops/sec;   28.3 MB/s
      (44294 of 5000000 found)
      readwhilewriting :       0.035 micros/op 28528304 ops/sec;   27.7 MB/s
      (44176 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27075497 ops/sec;   26.5 MB/s
      (43763 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28024117 ops/sec;   27.1 MB/s
      (40622 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27078709 ops/sec;   27.6 MB/s
      (47774 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29020689 ops/sec;   28.1 MB/s
      (45066 of 5000000 found)
      AVERAGE()=27.37 MB/s
      
      After re-ordering
      [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
      ro.log
      readwhilewriting :       0.036 micros/op 27542409 ops/sec;   27.7 MB/s
      (46163 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28021148 ops/sec;   28.2 MB/s
      (46155 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28021035 ops/sec;   27.3 MB/s
      (44039 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27538659 ops/sec;   27.5 MB/s
      (46781 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28028604 ops/sec;   27.6 MB/s
      (44689 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27541452 ops/sec;   27.3 MB/s
      (43156 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29041338 ops/sec;   28.8 MB/s
      (44895 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27784974 ops/sec;   26.3 MB/s
      (39963 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27538892 ops/sec;   28.1 MB/s
      (46570 of 5000000 found)
      readwhilewriting :       0.038 micros/op 26622473 ops/sec;   27.0 MB/s
      (43236 of 5000000 found)
      AVERAGE()=27.58 MB/s
      Closes https://github.com/facebook/rocksdb/pull/3855
      
      Reviewed By: siying
      
      Differential Revision: D8048781
      
      Pulled By: trentxintong
      
      fbshipit-source-id: bc9807a9845e2a92cb171ce1ecb5a2c8a51f1481
      aed7abbc
    • F
      Update HISTORY and version for upcoming 5.14 · fa43948c
      Fosco Marotto 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3866
      
      Differential Revision: D8043563
      
      Pulled By: gfosco
      
      fbshipit-source-id: da4af20e604534602ac0e07943135513fd9a9f53
      fa43948c
    • S
      In instrumented mutex, take timing once for both of perf_context and statistics · 7ccb35f6
      Siying Dong 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3427
      
      Differential Revision: D6827236
      
      Pulled By: siying
      
      fbshipit-source-id: d8a2cc525c90df625510565669f2659014259a8a
      7ccb35f6
  11. 17 5月, 2018 2 次提交
    • M
      Change and clarify the relationship between Valid(), status() and Seek*() for... · 8bf555f4
      Mike Kolupaev 提交于
      Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
      
      Summary:
      Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
       * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
       * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.
      
      However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).
      
      This PR changes the convention to:
       * If status() is not ok, Valid() always returns false.
       * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)
      
      This does sacrifice the two use cases listed above, but siying said it's ok.
      
      Overview of the changes:
       * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
       * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
       * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.
      
      To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:
      
      Iterators that didn't need changes:
       * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
       * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.
      
      Iterators with changes (see inline comments for details):
       * DBIter - an overhaul:
          - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
          - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
          - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
          - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
          - It used to not reset status on seek for some types of errors.
          - Some simplifications and better comments.
          - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
       * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
       * ForwardIterator - changed to the new convention, also slightly simplified.
       * ForwardLevelIterator - fixed some bugs and simplified.
       * LevelIterator - simplified.
       * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
       * BlockBasedTableIterator - minor changes.
       * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
       * PlainTableIterator - some seeks used to not reset status.
       * CuckooTableIterator - tiny code cleanup.
       * ManagedIterator - fixed some bugs.
       * BaseDeltaIterator - changed to the new convention and fixed a bug.
       * BlobDBIterator - seeks used to not reset status.
       * KeyConvertingIterator - some small change.
      Closes https://github.com/facebook/rocksdb/pull/3810
      
      Differential Revision: D7888019
      
      Pulled By: al13n321
      
      fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
      8bf555f4
    • M
      Fix race condition between log_.erase and log_.back · 46fde6b6
      Maysam Yabandeh 提交于
      Summary:
      log_ contract specifies that it should not be modified unless both mutex_ and log_write_mutex_ are held. log_.erase however does that with only holding mutex_. This causes a race condition with two_write_queues since logs_.back is read with holding only log_write_mutex_ (which is correct according to logs_ contract) but logs_.erase is called concurrently. This is probably the cause of logs_.back returning nullptr in https://github.com/facebook/rocksdb/issues/3852 although I could not reproduce it.
      Fixes https://github.com/facebook/rocksdb/issues/3852
      Closes https://github.com/facebook/rocksdb/pull/3859
      
      Differential Revision: D8026103
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ee394e00fe4aa520d884c5ef87981e9d6b5ccb28
      46fde6b6
  12. 15 5月, 2018 1 次提交