1. 23 5月, 2018 1 次提交
    • A
      Avoid single-deleting merge operands in db_stress · fcb31016
      Andrew Kryczka 提交于
      Summary:
      I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them.
      Closes https://github.com/facebook/rocksdb/pull/3878
      
      Differential Revision: D8097346
      
      Pulled By: ajkr
      
      fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1
      fcb31016
  2. 22 5月, 2018 6 次提交
    • S
      PersistRocksDBOptions() to use WritableFileWriter · 3db1ada3
      Siying Dong 提交于
      Summary:
      By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering.
      Closes https://github.com/facebook/rocksdb/pull/3882
      
      Differential Revision: D8080673
      
      Pulled By: siying
      
      fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b
      3db1ada3
    • Z
      Move prefix_extractor to MutableCFOptions · c3ebc758
      Zhongyi Xie 提交于
      Summary:
      Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
      This PR aims to make it possible to dynamically change bloom filter config.
      Closes https://github.com/facebook/rocksdb/pull/3601
      
      Differential Revision: D7253114
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
      c3ebc758
    • Y
      Update ColumnFamilyTest for multi-CF verification · 263ef52b
      Yanqin Jin 提交于
      Summary:
      Change `keys_` from `set<string>` to `vector<set<string>>` so that each column
      family's keys are stored in one set.
      
      ajkr When you have a chance, can you PTAL? Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3871
      
      Differential Revision: D8056447
      
      Pulled By: riversand963
      
      fbshipit-source-id: 650d0f9cad02b1bc005fc329ad76edbf053e6386
      263ef52b
    • A
      Print histogram count and sum in statistics string · 508a09fd
      Andrew Kryczka 提交于
      Summary:
      Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression.
      
      This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best...
      Closes https://github.com/facebook/rocksdb/pull/3863
      
      Differential Revision: D8038831
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c
      508a09fd
    • A
      Assert keys/values pinned by range deletion meta-block iterators · 7b655214
      Andrew Kryczka 提交于
      Summary:
      `RangeDelAggregator` holds the pointers returned by `BlockIter::key()` and `BlockIter::value()` so requires the data to which they point is pinned. `BlockIter::key()` points into block memory and is guaranteed to be pinned if and only if prefix encoding is disabled (or, equivalently, restart interval is set to one). I think `BlockIter::value()` is always pinned. Added an assert for these and removed the wrong TODO about increasing restart interval, which would enable key prefix encoding and break the assertion.
      Closes https://github.com/facebook/rocksdb/pull/3875
      
      Differential Revision: D8063667
      
      Pulled By: ajkr
      
      fbshipit-source-id: 60b5ebcc0cdd610dd6aad9e74a23378793672c41
      7b655214
    • A
      Add missing test files to src.mk · e410501e
      Andrew Kryczka 提交于
      Summary:
      We only generate the header dependency (".cc.d") files for files mentioned in "src.mk". When we don't generate them, changes to header dependencies do not cause `make` to recompile the dependent ".o". Then it takes a while for developers (or maybe just me) to realize `make clean` is necessary.
      Closes https://github.com/facebook/rocksdb/pull/3876
      
      Differential Revision: D8065389
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0f62eee7bcab15b0215791564e6ab3775d46996b
      e410501e
  3. 19 5月, 2018 2 次提交
    • Z
      fix a division by zero bug · ed4d3393
      Zhongyi Xie 提交于
      Summary:
      fixes the failing clang_analyze contrun test
      Closes https://github.com/facebook/rocksdb/pull/3872
      
      Differential Revision: D8059241
      
      Pulled By: miasantreble
      
      fbshipit-source-id: e8fc1838004fe16a823456188386b8b39429803b
      ed4d3393
    • S
      class Block to store num_restarts_ · 26da3676
      Siying Dong 提交于
      Summary:
      Right now, every Block::NewIterator() reads num_restarts_ from the block, which is already read in Block::Block(). This sometimes cause a CPU cache miss. Although fetching this cacheline can usually benefit follow-up block restart offset reading, as they are close to each other, it's almost free to get ride of this read by storing it in the Block class.
      Closes https://github.com/facebook/rocksdb/pull/3869
      
      Differential Revision: D8052493
      
      Pulled By: siying
      
      fbshipit-source-id: 9c72360f0c2d7329f3c198ce4eaedd2bc14b87c1
      26da3676
  4. 18 5月, 2018 6 次提交
    • Y
      Set the default value of max_manifest_file_size. · a0c7b4d5
      Yanqin Jin 提交于
      Summary:
      In the past, the default value of max_manifest_file_size is uint64_t::MAX,
      allowing a long running RocksDB process to grow its MANIFEST file to take up
      the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB.
      
      siying miasantreble Please let me know whether this looks good to you. Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3867
      
      Differential Revision: D8051524
      
      Pulled By: riversand963
      
      fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72
      a0c7b4d5
    • S
      Implement key shortening functions in ReverseBytewiseComparator · 17af09fc
      Siying Dong 提交于
      Summary:
      Right now ReverseBytewiseComparator::FindShortestSeparator() doesn't really shorten key, and ReverseBytewiseComparator::FindShortestSuccessor() seems to return wrong results. The code is confusing too as it uses BytewiseComparatorImpl::FindShortestSeparator() but the function actually won't do anything if the the first key is larger than the second.
      
      Implement ReverseBytewiseComparator::FindShortestSeparator() and override ReverseBytewiseComparator::FindShortestSuccessor() to be empty.
      Closes https://github.com/facebook/rocksdb/pull/3836
      
      Differential Revision: D7959762
      
      Pulled By: siying
      
      fbshipit-source-id: 93acb621c16ce6f23e087ae4e19f7d84d1254683
      17af09fc
    • Z
      add override to virtual functions · 1d7ca20f
      Zhongyi Xie 提交于
      Summary:
      this will fix the failing clang_check test
      Closes https://github.com/facebook/rocksdb/pull/3868
      
      Differential Revision: D8050880
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 749932e2e4025f835c961c068d601e522a126da6
      1d7ca20f
    • X
      Reorder field based on esan data · aed7abbc
      Xin Tong 提交于
      Summary:
      Running. TEST_TMPDIR=/dev/shm ./buck-out/gen/rocks/tools/rocks_db_bench --benchmarks=readwhilewriting --num=5000000 -benchmark_write_rate_limit=2000000 --threads=32
      
      Collected esan data and reorder field. Accesses to 4th and 6th fields take majority of the access.  Group them. Overall, this struct takes 10%+ of the total accesses in the program. (637773011/6107964986)
      
      ==2433831==  class rocksdb::InlineSkipList
      ==2433831==   size = 48, count = 637773011, ratio = 112412, array access = 0
      ==2433831==   # 0: offset = 0,   size = 2,       count = 455137, type = i16
      ==2433831==   # 1: offset = 2,   size = 2,       count = 6,      type = i16
      ==2433831==   # 2: offset = 4,   size = 4,       count = 182303, type = i32
      ==2433831==   # 3: offset = 8,   size = 8,       count = 263953900, type = %"class.rocksdb::MemTableRep::KeyComparator"*
      ==2433831==   # 4: offset = 16,  size = 8,       count = 136409, type = %"class.rocksdb::Allocator"*
      ==2433831==   # 5: offset = 24,  size = 8,       count = 366628820, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Node"*
      ==2433831==   # 6: offset = 32,  size = 4,       count = 6280031, type = %"struct.std::atomic" = type { %"struct.std::__atomic_base" }
      ==2433831==   # 7: offset = 40,  size = 8,       count = 136405, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Splice"*
      ==2433831==EfficiencySanitizer: total struct field access count = 6107964986
      
      Before re-ordering
      [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
      without-ro.log
      readwhilewriting :       0.036 micros/op 27545605 ops/sec;   26.8 MB/s
      (45954 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28024240 ops/sec;   27.2 MB/s
      (43158 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27345145 ops/sec;   27.1 MB/s
      (46725 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27072588 ops/sec;   27.3 MB/s
      (42605 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29578781 ops/sec;   28.3 MB/s
      (44294 of 5000000 found)
      readwhilewriting :       0.035 micros/op 28528304 ops/sec;   27.7 MB/s
      (44176 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27075497 ops/sec;   26.5 MB/s
      (43763 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28024117 ops/sec;   27.1 MB/s
      (40622 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27078709 ops/sec;   27.6 MB/s
      (47774 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29020689 ops/sec;   28.1 MB/s
      (45066 of 5000000 found)
      AVERAGE()=27.37 MB/s
      
      After re-ordering
      [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
      ro.log
      readwhilewriting :       0.036 micros/op 27542409 ops/sec;   27.7 MB/s
      (46163 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28021148 ops/sec;   28.2 MB/s
      (46155 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28021035 ops/sec;   27.3 MB/s
      (44039 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27538659 ops/sec;   27.5 MB/s
      (46781 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28028604 ops/sec;   27.6 MB/s
      (44689 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27541452 ops/sec;   27.3 MB/s
      (43156 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29041338 ops/sec;   28.8 MB/s
      (44895 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27784974 ops/sec;   26.3 MB/s
      (39963 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27538892 ops/sec;   28.1 MB/s
      (46570 of 5000000 found)
      readwhilewriting :       0.038 micros/op 26622473 ops/sec;   27.0 MB/s
      (43236 of 5000000 found)
      AVERAGE()=27.58 MB/s
      Closes https://github.com/facebook/rocksdb/pull/3855
      
      Reviewed By: siying
      
      Differential Revision: D8048781
      
      Pulled By: trentxintong
      
      fbshipit-source-id: bc9807a9845e2a92cb171ce1ecb5a2c8a51f1481
      aed7abbc
    • F
      Update HISTORY and version for upcoming 5.14 · fa43948c
      Fosco Marotto 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3866
      
      Differential Revision: D8043563
      
      Pulled By: gfosco
      
      fbshipit-source-id: da4af20e604534602ac0e07943135513fd9a9f53
      fa43948c
    • S
      In instrumented mutex, take timing once for both of perf_context and statistics · 7ccb35f6
      Siying Dong 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3427
      
      Differential Revision: D6827236
      
      Pulled By: siying
      
      fbshipit-source-id: d8a2cc525c90df625510565669f2659014259a8a
      7ccb35f6
  5. 17 5月, 2018 2 次提交
    • M
      Change and clarify the relationship between Valid(), status() and Seek*() for... · 8bf555f4
      Mike Kolupaev 提交于
      Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
      
      Summary:
      Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
       * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
       * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.
      
      However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).
      
      This PR changes the convention to:
       * If status() is not ok, Valid() always returns false.
       * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)
      
      This does sacrifice the two use cases listed above, but siying said it's ok.
      
      Overview of the changes:
       * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
       * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
       * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.
      
      To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:
      
      Iterators that didn't need changes:
       * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
       * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.
      
      Iterators with changes (see inline comments for details):
       * DBIter - an overhaul:
          - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
          - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
          - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
          - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
          - It used to not reset status on seek for some types of errors.
          - Some simplifications and better comments.
          - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
       * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
       * ForwardIterator - changed to the new convention, also slightly simplified.
       * ForwardLevelIterator - fixed some bugs and simplified.
       * LevelIterator - simplified.
       * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
       * BlockBasedTableIterator - minor changes.
       * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
       * PlainTableIterator - some seeks used to not reset status.
       * CuckooTableIterator - tiny code cleanup.
       * ManagedIterator - fixed some bugs.
       * BaseDeltaIterator - changed to the new convention and fixed a bug.
       * BlobDBIterator - seeks used to not reset status.
       * KeyConvertingIterator - some small change.
      Closes https://github.com/facebook/rocksdb/pull/3810
      
      Differential Revision: D7888019
      
      Pulled By: al13n321
      
      fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
      8bf555f4
    • M
      Fix race condition between log_.erase and log_.back · 46fde6b6
      Maysam Yabandeh 提交于
      Summary:
      log_ contract specifies that it should not be modified unless both mutex_ and log_write_mutex_ are held. log_.erase however does that with only holding mutex_. This causes a race condition with two_write_queues since logs_.back is read with holding only log_write_mutex_ (which is correct according to logs_ contract) but logs_.erase is called concurrently. This is probably the cause of logs_.back returning nullptr in https://github.com/facebook/rocksdb/issues/3852 although I could not reproduce it.
      Fixes https://github.com/facebook/rocksdb/issues/3852
      Closes https://github.com/facebook/rocksdb/pull/3859
      
      Differential Revision: D8026103
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ee394e00fe4aa520d884c5ef87981e9d6b5ccb28
      46fde6b6
  6. 15 5月, 2018 5 次提交
    • A
      Fix geo_db may seek an error key when they have the same quadkey · 42cb4775
      acelyc111 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3832
      
      Differential Revision: D7994326
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 84a81b35b97750360423a9d4eca5b5a14d002134
      42cb4775
    • M
      Suppress tsan lock-order-inversion on FlushWAL · 12ad7112
      Maysam Yabandeh 提交于
      Summary:
      TSAN reports a false alarm for lock-order-inversion in DBWriteTest.IOErrorOnWALWritePropagateToWriteThreadFollower but Open and FlushWAL are not run concurrently. Suppressing the error by skipping FlushWAL in the test until TSAN is fixed.
      
      The alternative would be to use
      ```
      TSAN_OPTIONS="suppressions=tsan-suppressions.txt" ./db_write_test
      ```
      but it does not seem straightforward to integrate it to our test infra.
      Closes https://github.com/facebook/rocksdb/pull/3854
      
      Differential Revision: D8000202
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: fde33483d963a7ad84d3145123821f64960a4802
      12ad7112
    • A
      Bottommost level-based compactions in bottom-pri pool · 3d7dc75b
      Andrew Kryczka 提交于
      Summary:
      This feature was introduced for universal compaction in cc01985d. At that point we thought it'd be used only to prevent long-running universal full compactions from blocking short-lived upper-level compactions. Now we have a level compaction user who could benefit from it since they use more expensive compression algorithm in the bottom level. So enable it for level.
      Closes https://github.com/facebook/rocksdb/pull/3835
      
      Differential Revision: D7957179
      
      Pulled By: ajkr
      
      fbshipit-source-id: 177285d2cef3b650b6a4d81dc5db84bc441c9fe4
      3d7dc75b
    • S
      Fix db_stress build on mac · ebb823f7
      Sagar Vemuri 提交于
      Summary:
      I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`.
      ```
      $ make db_stress -j4
      ...
      tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *')
              status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size);
                                                                          ^~~~~
      ./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here
        virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0;
                                                                       ^
      1 error generated.
      make: *** [tools/db_stress.o] Error 1
      make: *** Waiting for unfinished jobs....
      ```
      Closes https://github.com/facebook/rocksdb/pull/3839
      
      Differential Revision: D7979236
      
      Pulled By: sagar0
      
      fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62
      ebb823f7
    • M
      Pass manual_wal_flush also to the first wal file · 718c1c9c
      Maysam Yabandeh 提交于
      Summary:
      Currently manual_wal_flush if set in the options will be used only for the wal files created during wal switch. The configuration thus does not affect the first wal file. The patch fixes that and also update the related unit tests.
      This PR is built on top of https://github.com/facebook/rocksdb/pull/3756
      Closes https://github.com/facebook/rocksdb/pull/3824
      
      Differential Revision: D7909153
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 024ed99d2555db06bf096c902b998e432bb7b9ce
      718c1c9c
  7. 12 5月, 2018 2 次提交
  8. 10 5月, 2018 5 次提交
    • A
      Apply use_direct_io_for_flush_and_compaction to writes only · 072ae671
      Andrew Kryczka 提交于
      Summary:
      Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used.
      
      This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously.
      Closes https://github.com/facebook/rocksdb/pull/3829
      
      Differential Revision: D7915443
      
      Pulled By: ajkr
      
      fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279
      072ae671
    • A
      Refactor argument handling in db_crashtest.py · d19f568a
      Andrew Kryczka 提交于
      Summary:
      - Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option.
      - Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default
      - Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params.
      - Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication
      Closes https://github.com/facebook/rocksdb/pull/3809
      
      Differential Revision: D7885779
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8
      d19f568a
    • S
      Disallow to open RandomRW file if the file doesn't exist · 3690276e
      Siying Dong 提交于
      Summary:
      The only use of RandomRW is to change seqno when bulkloading, and in this use case, the file should exist. We should fail the file opening in this case.
      Closes https://github.com/facebook/rocksdb/pull/3827
      
      Differential Revision: D7913719
      
      Pulled By: siying
      
      fbshipit-source-id: 62cf6734f1a6acb9e14f715b927da388131c3492
      3690276e
    • S
      Make BlockIter final · ddfd2525
      Siying Dong 提交于
      Summary:
      Now BlockBasedTableIterator directly uses BlockIter. By making BlockIter final, we can prevent unintended virtual function overriding.
      Closes https://github.com/facebook/rocksdb/pull/3828
      
      Differential Revision: D7933816
      
      Pulled By: siying
      
      fbshipit-source-id: 026a08cb5c5b6d3d6f44743152b4251da4756f2c
      ddfd2525
    • D
      Introduce and use the option to disable stall notifications structures · f92cd2fe
      Dmitri Smirnov 提交于
      Summary:
      and code. Removing this helps with insert performance.
      Closes https://github.com/facebook/rocksdb/pull/3830
      
      Differential Revision: D7921030
      
      Pulled By: siying
      
      fbshipit-source-id: 84e80d50a7ef96f5441c51c9a0d089c50217cce2
      f92cd2fe
  9. 09 5月, 2018 2 次提交
    • H
      Add missing options in BuildColumnfamilyOptions · cee138c7
      Huachao Huang 提交于
      Summary:
      soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit are added to BuildColumnfamilyOptions.
      Closes https://github.com/facebook/rocksdb/pull/3823
      
      Differential Revision: D7909246
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 89032efbf6b5bd302ea50cbd7a234977984a1fca
      cee138c7
    • A
      Disable readahead when using mmap for reads · 4bf169f0
      Andrew Kryczka 提交于
      Summary:
      `ReadaheadRandomAccessFile` had an unwritten assumption, which was that its wrapped file's `Read()` function always copies into the provided scratch buffer. Actually this was not true when the wrapped file was `PosixMmapReadableFile`, whose `Read()` implementation does no copying and instead returns a `Slice` pointing directly into the  `mmap`'d memory region. This PR:
      
      - prevents `ReadaheadRandomAccessFile` from ever wrapping mmap readable files
      - adds an assert for the assumption `ReadaheadRandomAccessFile` makes about the wrapped file's use of scratch buffer
      Closes https://github.com/facebook/rocksdb/pull/3813
      
      Differential Revision: D7891513
      
      Pulled By: ajkr
      
      fbshipit-source-id: dc64a55222d6af280c39a1852ee39e9e9d7cde7d
      4bf169f0
  10. 08 5月, 2018 4 次提交
    • T
      Link jemalloc · 1d9f24dc
      Tongliang Liao 提交于
      Summary:
      Fix undefined reference to `malloc_*` linking errors on Linux.
      Closes https://github.com/facebook/rocksdb/pull/3817
      
      Differential Revision: D7899066
      
      Pulled By: ajkr
      
      fbshipit-source-id: 18c46569a59608388d6240f1b8ec20c2d2557dec
      1d9f24dc
    • T
      Allows other cmake-specific "true" for USE_RTTI. · 9470ee45
      Tongliang Liao 提交于
      Summary:
      People also use ON/OFF, TRUE/FALSE and other switch options that is allowed by cmake.
      Closes https://github.com/facebook/rocksdb/pull/3814
      
      Differential Revision: D7899032
      
      Pulled By: ajkr
      
      fbshipit-source-id: b71511af59e0a78eedafb639b5002c47050bf3c2
      9470ee45
    • T
      Search paths provided by intel's "tbbvars.sh". · 6d6e01cd
      Tongliang Liao 提交于
      Summary:
      TBBROOT and LIBRARY_PATH are set in env by the script.
      
      With TBB 2018 the library path is $TBBROOT/lib/intel64/gcc4.7 for anything above gcc 4.7, which is both compiler and architecture related. We cannot simply do ${TBB_ROOT_DIR}/lib.
      Closes https://github.com/facebook/rocksdb/pull/3815
      
      Differential Revision: D7899006
      
      Pulled By: ajkr
      
      fbshipit-source-id: 159ab1f6a5c40452ed6aa8d79300206953d916c2
      6d6e01cd
    • M
      Split FaultInjectionTest.FaultTest to avoid timeout · d72a51e9
      Maysam Yabandeh 提交于
      Summary:
      tsan flavor of this test occasionally times out in our test infra. The patch split the test to two, each working on half of the option range.
      Before:
      [       OK ] FaultTest/FaultInjectionTest.FaultTest/0 (5918 ms)
      [       OK ] FaultTest/FaultInjectionTest.FaultTest/1 (5336 ms)
      After:
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/0 (2930 ms)
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/1 (2676 ms)
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/2 (2759 ms)
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/3 (2546 ms)
      Closes https://github.com/facebook/rocksdb/pull/3819
      
      Differential Revision: D7894975
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 809f1411cbcc27f8aa71a6b29a16b039f51b67c9
      d72a51e9
  11. 05 5月, 2018 5 次提交
    • daheiantian's avatar
      Recommit "Avoid adding tombstones of the same file to RangeDelAggregator multiple times" · 72942ad7
      daheiantian 提交于
      Summary:
      The origin commit #3635  will hurt performance for users who aren't using range deletions, because unneeded std::set operations, so it was reverted by commit 44653c7b. (see #3672)
      
      To fix this, move the set to  and add a check in , i.e., file will be added only if  is non-nullptr.
      
      The db_bench command which find the performance regression:
      > ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 > --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 > --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 > -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none
      
      Before and after the modification, I re-run this command on the machine, the results of are as follows:
      
        **fillrandom**
       Table | P50 | P75 | P99 | P99.9 | P99.99 |
        ---- | --- | --- | --- | ----- | ------ |
       before commit | 5.92 | 8.57 | 19.63 | 980.97 | 12196.00 |
       after commit  | 5.91 | 8.55 | 19.34 | 965.56 | 13513.56 |
      
       **seekrandomwhilewriting**
        Table | P50 | P75 | P99 | P99.9 | P99.99 |
         ---- | --- | --- | --- | ----- | ------ |
       before commit | 1418.62 | 1867.01 | 3823.28 | 4980.99 | 9240.00 |
       after commit  | 1450.54 | 1880.61 | 3962.87 | 5429.60 | 7542.86 |
      Closes https://github.com/facebook/rocksdb/pull/3800
      
      Differential Revision: D7874245
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2e8bec781b3f7399246babd66395c88619534a17
      72942ad7
    • A
      Fix db_stress memory leak ASAN error · 4c5a3232
      Andrew Kryczka 提交于
      Summary:
      In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed.
      Closes https://github.com/facebook/rocksdb/pull/3804
      
      Differential Revision: D7874694
      
      Pulled By: ajkr
      
      fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b
      4c5a3232
    • M
      Evenly split HarnessTest.Randomized · fc522bdb
      Maysam Yabandeh 提交于
      Summary:
      Currently HarnessTest.Randomized is already split but some of the splits are faster than the others. The reason is that each split takes a continuous range of the generated args and the test with later args takes longer to finish. The patch evenly split the args among splits in a round robin fashion.
      Before:
      ```
      [       OK ] HarnessTest.Randomized1n2 (2278 ms)
      [       OK ] HarnessTest.Randomized3n4 (1095 ms)
      [       OK ] HarnessTest.Randomized5 (658 ms)
      [       OK ] HarnessTest.Randomized6 (1258 ms)
      [       OK ] HarnessTest.Randomized7 (6476 ms)
      [       OK ] HarnessTest.Randomized8 (8182 ms)
      ```
      After
      ```
      [       OK ] HarnessTest.Randomized1 (2649 ms)
      [       OK ] HarnessTest.Randomized2 (2645 ms)
      [       OK ] HarnessTest.Randomized3 (2577 ms)
      [       OK ] HarnessTest.Randomized4 (2490 ms)
      [       OK ] HarnessTest.Randomized5 (2553 ms)
      [       OK ] HarnessTest.Randomized6 (2560 ms)
      [       OK ] HarnessTest.Randomized7 (2501 ms)
      [       OK ] HarnessTest.Randomized8 (2574 ms)
      ```
      Closes https://github.com/facebook/rocksdb/pull/3808
      
      Differential Revision: D7882663
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 09b749a9684b6d7d65466aa4b00c5334a49e833e
      fc522bdb
    • M
      Rename vars to satisfy unity built · 171f415b
      Maysam Yabandeh 提交于
      Summary:
      Tested by "make unity_test"
      Closes https://github.com/facebook/rocksdb/pull/3807
      
      Differential Revision: D7882657
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 84862c18d7f2fc762bd96ad070eaeb6936e45159
      171f415b
    • F
      Add USE_RTTI and default behavior to CMakeLists · 4d40b10e
      Fosco Marotto 提交于
      Summary:
      Proposed fix for #3701
      Closes https://github.com/facebook/rocksdb/pull/3801
      
      Differential Revision: D7868264
      
      Pulled By: gfosco
      
      fbshipit-source-id: 013963ed3d172c8dc2abd1dd5982580082ca5d2d
      4d40b10e