1. 22 5月, 2018 3 次提交
    • A
      Print histogram count and sum in statistics string · 508a09fd
      Andrew Kryczka 提交于
      Summary:
      Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression.
      
      This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best...
      Closes https://github.com/facebook/rocksdb/pull/3863
      
      Differential Revision: D8038831
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c
      508a09fd
    • A
      Assert keys/values pinned by range deletion meta-block iterators · 7b655214
      Andrew Kryczka 提交于
      Summary:
      `RangeDelAggregator` holds the pointers returned by `BlockIter::key()` and `BlockIter::value()` so requires the data to which they point is pinned. `BlockIter::key()` points into block memory and is guaranteed to be pinned if and only if prefix encoding is disabled (or, equivalently, restart interval is set to one). I think `BlockIter::value()` is always pinned. Added an assert for these and removed the wrong TODO about increasing restart interval, which would enable key prefix encoding and break the assertion.
      Closes https://github.com/facebook/rocksdb/pull/3875
      
      Differential Revision: D8063667
      
      Pulled By: ajkr
      
      fbshipit-source-id: 60b5ebcc0cdd610dd6aad9e74a23378793672c41
      7b655214
    • A
      Add missing test files to src.mk · e410501e
      Andrew Kryczka 提交于
      Summary:
      We only generate the header dependency (".cc.d") files for files mentioned in "src.mk". When we don't generate them, changes to header dependencies do not cause `make` to recompile the dependent ".o". Then it takes a while for developers (or maybe just me) to realize `make clean` is necessary.
      Closes https://github.com/facebook/rocksdb/pull/3876
      
      Differential Revision: D8065389
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0f62eee7bcab15b0215791564e6ab3775d46996b
      e410501e
  2. 19 5月, 2018 2 次提交
    • Z
      fix a division by zero bug · ed4d3393
      Zhongyi Xie 提交于
      Summary:
      fixes the failing clang_analyze contrun test
      Closes https://github.com/facebook/rocksdb/pull/3872
      
      Differential Revision: D8059241
      
      Pulled By: miasantreble
      
      fbshipit-source-id: e8fc1838004fe16a823456188386b8b39429803b
      ed4d3393
    • S
      class Block to store num_restarts_ · 26da3676
      Siying Dong 提交于
      Summary:
      Right now, every Block::NewIterator() reads num_restarts_ from the block, which is already read in Block::Block(). This sometimes cause a CPU cache miss. Although fetching this cacheline can usually benefit follow-up block restart offset reading, as they are close to each other, it's almost free to get ride of this read by storing it in the Block class.
      Closes https://github.com/facebook/rocksdb/pull/3869
      
      Differential Revision: D8052493
      
      Pulled By: siying
      
      fbshipit-source-id: 9c72360f0c2d7329f3c198ce4eaedd2bc14b87c1
      26da3676
  3. 18 5月, 2018 6 次提交
    • Y
      Set the default value of max_manifest_file_size. · a0c7b4d5
      Yanqin Jin 提交于
      Summary:
      In the past, the default value of max_manifest_file_size is uint64_t::MAX,
      allowing a long running RocksDB process to grow its MANIFEST file to take up
      the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB.
      
      siying miasantreble Please let me know whether this looks good to you. Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3867
      
      Differential Revision: D8051524
      
      Pulled By: riversand963
      
      fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72
      a0c7b4d5
    • S
      Implement key shortening functions in ReverseBytewiseComparator · 17af09fc
      Siying Dong 提交于
      Summary:
      Right now ReverseBytewiseComparator::FindShortestSeparator() doesn't really shorten key, and ReverseBytewiseComparator::FindShortestSuccessor() seems to return wrong results. The code is confusing too as it uses BytewiseComparatorImpl::FindShortestSeparator() but the function actually won't do anything if the the first key is larger than the second.
      
      Implement ReverseBytewiseComparator::FindShortestSeparator() and override ReverseBytewiseComparator::FindShortestSuccessor() to be empty.
      Closes https://github.com/facebook/rocksdb/pull/3836
      
      Differential Revision: D7959762
      
      Pulled By: siying
      
      fbshipit-source-id: 93acb621c16ce6f23e087ae4e19f7d84d1254683
      17af09fc
    • Z
      add override to virtual functions · 1d7ca20f
      Zhongyi Xie 提交于
      Summary:
      this will fix the failing clang_check test
      Closes https://github.com/facebook/rocksdb/pull/3868
      
      Differential Revision: D8050880
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 749932e2e4025f835c961c068d601e522a126da6
      1d7ca20f
    • X
      Reorder field based on esan data · aed7abbc
      Xin Tong 提交于
      Summary:
      Running. TEST_TMPDIR=/dev/shm ./buck-out/gen/rocks/tools/rocks_db_bench --benchmarks=readwhilewriting --num=5000000 -benchmark_write_rate_limit=2000000 --threads=32
      
      Collected esan data and reorder field. Accesses to 4th and 6th fields take majority of the access.  Group them. Overall, this struct takes 10%+ of the total accesses in the program. (637773011/6107964986)
      
      ==2433831==  class rocksdb::InlineSkipList
      ==2433831==   size = 48, count = 637773011, ratio = 112412, array access = 0
      ==2433831==   # 0: offset = 0,   size = 2,       count = 455137, type = i16
      ==2433831==   # 1: offset = 2,   size = 2,       count = 6,      type = i16
      ==2433831==   # 2: offset = 4,   size = 4,       count = 182303, type = i32
      ==2433831==   # 3: offset = 8,   size = 8,       count = 263953900, type = %"class.rocksdb::MemTableRep::KeyComparator"*
      ==2433831==   # 4: offset = 16,  size = 8,       count = 136409, type = %"class.rocksdb::Allocator"*
      ==2433831==   # 5: offset = 24,  size = 8,       count = 366628820, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Node"*
      ==2433831==   # 6: offset = 32,  size = 4,       count = 6280031, type = %"struct.std::atomic" = type { %"struct.std::__atomic_base" }
      ==2433831==   # 7: offset = 40,  size = 8,       count = 136405, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Splice"*
      ==2433831==EfficiencySanitizer: total struct field access count = 6107964986
      
      Before re-ordering
      [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
      without-ro.log
      readwhilewriting :       0.036 micros/op 27545605 ops/sec;   26.8 MB/s
      (45954 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28024240 ops/sec;   27.2 MB/s
      (43158 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27345145 ops/sec;   27.1 MB/s
      (46725 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27072588 ops/sec;   27.3 MB/s
      (42605 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29578781 ops/sec;   28.3 MB/s
      (44294 of 5000000 found)
      readwhilewriting :       0.035 micros/op 28528304 ops/sec;   27.7 MB/s
      (44176 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27075497 ops/sec;   26.5 MB/s
      (43763 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28024117 ops/sec;   27.1 MB/s
      (40622 of 5000000 found)
      readwhilewriting :       0.037 micros/op 27078709 ops/sec;   27.6 MB/s
      (47774 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29020689 ops/sec;   28.1 MB/s
      (45066 of 5000000 found)
      AVERAGE()=27.37 MB/s
      
      After re-ordering
      [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting
      ro.log
      readwhilewriting :       0.036 micros/op 27542409 ops/sec;   27.7 MB/s
      (46163 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28021148 ops/sec;   28.2 MB/s
      (46155 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28021035 ops/sec;   27.3 MB/s
      (44039 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27538659 ops/sec;   27.5 MB/s
      (46781 of 5000000 found)
      readwhilewriting :       0.036 micros/op 28028604 ops/sec;   27.6 MB/s
      (44689 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27541452 ops/sec;   27.3 MB/s
      (43156 of 5000000 found)
      readwhilewriting :       0.034 micros/op 29041338 ops/sec;   28.8 MB/s
      (44895 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27784974 ops/sec;   26.3 MB/s
      (39963 of 5000000 found)
      readwhilewriting :       0.036 micros/op 27538892 ops/sec;   28.1 MB/s
      (46570 of 5000000 found)
      readwhilewriting :       0.038 micros/op 26622473 ops/sec;   27.0 MB/s
      (43236 of 5000000 found)
      AVERAGE()=27.58 MB/s
      Closes https://github.com/facebook/rocksdb/pull/3855
      
      Reviewed By: siying
      
      Differential Revision: D8048781
      
      Pulled By: trentxintong
      
      fbshipit-source-id: bc9807a9845e2a92cb171ce1ecb5a2c8a51f1481
      aed7abbc
    • F
      Update HISTORY and version for upcoming 5.14 · fa43948c
      Fosco Marotto 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3866
      
      Differential Revision: D8043563
      
      Pulled By: gfosco
      
      fbshipit-source-id: da4af20e604534602ac0e07943135513fd9a9f53
      fa43948c
    • S
      In instrumented mutex, take timing once for both of perf_context and statistics · 7ccb35f6
      Siying Dong 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3427
      
      Differential Revision: D6827236
      
      Pulled By: siying
      
      fbshipit-source-id: d8a2cc525c90df625510565669f2659014259a8a
      7ccb35f6
  4. 17 5月, 2018 2 次提交
    • M
      Change and clarify the relationship between Valid(), status() and Seek*() for... · 8bf555f4
      Mike Kolupaev 提交于
      Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
      
      Summary:
      Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
       * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
       * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.
      
      However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).
      
      This PR changes the convention to:
       * If status() is not ok, Valid() always returns false.
       * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)
      
      This does sacrifice the two use cases listed above, but siying said it's ok.
      
      Overview of the changes:
       * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
       * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
       * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.
      
      To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:
      
      Iterators that didn't need changes:
       * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
       * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.
      
      Iterators with changes (see inline comments for details):
       * DBIter - an overhaul:
          - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
          - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
          - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
          - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
          - It used to not reset status on seek for some types of errors.
          - Some simplifications and better comments.
          - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
       * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
       * ForwardIterator - changed to the new convention, also slightly simplified.
       * ForwardLevelIterator - fixed some bugs and simplified.
       * LevelIterator - simplified.
       * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
       * BlockBasedTableIterator - minor changes.
       * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
       * PlainTableIterator - some seeks used to not reset status.
       * CuckooTableIterator - tiny code cleanup.
       * ManagedIterator - fixed some bugs.
       * BaseDeltaIterator - changed to the new convention and fixed a bug.
       * BlobDBIterator - seeks used to not reset status.
       * KeyConvertingIterator - some small change.
      Closes https://github.com/facebook/rocksdb/pull/3810
      
      Differential Revision: D7888019
      
      Pulled By: al13n321
      
      fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
      8bf555f4
    • M
      Fix race condition between log_.erase and log_.back · 46fde6b6
      Maysam Yabandeh 提交于
      Summary:
      log_ contract specifies that it should not be modified unless both mutex_ and log_write_mutex_ are held. log_.erase however does that with only holding mutex_. This causes a race condition with two_write_queues since logs_.back is read with holding only log_write_mutex_ (which is correct according to logs_ contract) but logs_.erase is called concurrently. This is probably the cause of logs_.back returning nullptr in https://github.com/facebook/rocksdb/issues/3852 although I could not reproduce it.
      Fixes https://github.com/facebook/rocksdb/issues/3852
      Closes https://github.com/facebook/rocksdb/pull/3859
      
      Differential Revision: D8026103
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ee394e00fe4aa520d884c5ef87981e9d6b5ccb28
      46fde6b6
  5. 15 5月, 2018 5 次提交
    • A
      Fix geo_db may seek an error key when they have the same quadkey · 42cb4775
      acelyc111 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3832
      
      Differential Revision: D7994326
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 84a81b35b97750360423a9d4eca5b5a14d002134
      42cb4775
    • M
      Suppress tsan lock-order-inversion on FlushWAL · 12ad7112
      Maysam Yabandeh 提交于
      Summary:
      TSAN reports a false alarm for lock-order-inversion in DBWriteTest.IOErrorOnWALWritePropagateToWriteThreadFollower but Open and FlushWAL are not run concurrently. Suppressing the error by skipping FlushWAL in the test until TSAN is fixed.
      
      The alternative would be to use
      ```
      TSAN_OPTIONS="suppressions=tsan-suppressions.txt" ./db_write_test
      ```
      but it does not seem straightforward to integrate it to our test infra.
      Closes https://github.com/facebook/rocksdb/pull/3854
      
      Differential Revision: D8000202
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: fde33483d963a7ad84d3145123821f64960a4802
      12ad7112
    • A
      Bottommost level-based compactions in bottom-pri pool · 3d7dc75b
      Andrew Kryczka 提交于
      Summary:
      This feature was introduced for universal compaction in cc01985d. At that point we thought it'd be used only to prevent long-running universal full compactions from blocking short-lived upper-level compactions. Now we have a level compaction user who could benefit from it since they use more expensive compression algorithm in the bottom level. So enable it for level.
      Closes https://github.com/facebook/rocksdb/pull/3835
      
      Differential Revision: D7957179
      
      Pulled By: ajkr
      
      fbshipit-source-id: 177285d2cef3b650b6a4d81dc5db84bc441c9fe4
      3d7dc75b
    • S
      Fix db_stress build on mac · ebb823f7
      Sagar Vemuri 提交于
      Summary:
      I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`.
      ```
      $ make db_stress -j4
      ...
      tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *')
              status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size);
                                                                          ^~~~~
      ./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here
        virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0;
                                                                       ^
      1 error generated.
      make: *** [tools/db_stress.o] Error 1
      make: *** Waiting for unfinished jobs....
      ```
      Closes https://github.com/facebook/rocksdb/pull/3839
      
      Differential Revision: D7979236
      
      Pulled By: sagar0
      
      fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62
      ebb823f7
    • M
      Pass manual_wal_flush also to the first wal file · 718c1c9c
      Maysam Yabandeh 提交于
      Summary:
      Currently manual_wal_flush if set in the options will be used only for the wal files created during wal switch. The configuration thus does not affect the first wal file. The patch fixes that and also update the related unit tests.
      This PR is built on top of https://github.com/facebook/rocksdb/pull/3756
      Closes https://github.com/facebook/rocksdb/pull/3824
      
      Differential Revision: D7909153
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 024ed99d2555db06bf096c902b998e432bb7b9ce
      718c1c9c
  6. 12 5月, 2018 2 次提交
  7. 10 5月, 2018 5 次提交
    • A
      Apply use_direct_io_for_flush_and_compaction to writes only · 072ae671
      Andrew Kryczka 提交于
      Summary:
      Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used.
      
      This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously.
      Closes https://github.com/facebook/rocksdb/pull/3829
      
      Differential Revision: D7915443
      
      Pulled By: ajkr
      
      fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279
      072ae671
    • A
      Refactor argument handling in db_crashtest.py · d19f568a
      Andrew Kryczka 提交于
      Summary:
      - Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option.
      - Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default
      - Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params.
      - Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication
      Closes https://github.com/facebook/rocksdb/pull/3809
      
      Differential Revision: D7885779
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8
      d19f568a
    • S
      Disallow to open RandomRW file if the file doesn't exist · 3690276e
      Siying Dong 提交于
      Summary:
      The only use of RandomRW is to change seqno when bulkloading, and in this use case, the file should exist. We should fail the file opening in this case.
      Closes https://github.com/facebook/rocksdb/pull/3827
      
      Differential Revision: D7913719
      
      Pulled By: siying
      
      fbshipit-source-id: 62cf6734f1a6acb9e14f715b927da388131c3492
      3690276e
    • S
      Make BlockIter final · ddfd2525
      Siying Dong 提交于
      Summary:
      Now BlockBasedTableIterator directly uses BlockIter. By making BlockIter final, we can prevent unintended virtual function overriding.
      Closes https://github.com/facebook/rocksdb/pull/3828
      
      Differential Revision: D7933816
      
      Pulled By: siying
      
      fbshipit-source-id: 026a08cb5c5b6d3d6f44743152b4251da4756f2c
      ddfd2525
    • D
      Introduce and use the option to disable stall notifications structures · f92cd2fe
      Dmitri Smirnov 提交于
      Summary:
      and code. Removing this helps with insert performance.
      Closes https://github.com/facebook/rocksdb/pull/3830
      
      Differential Revision: D7921030
      
      Pulled By: siying
      
      fbshipit-source-id: 84e80d50a7ef96f5441c51c9a0d089c50217cce2
      f92cd2fe
  8. 09 5月, 2018 2 次提交
    • H
      Add missing options in BuildColumnfamilyOptions · cee138c7
      Huachao Huang 提交于
      Summary:
      soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit are added to BuildColumnfamilyOptions.
      Closes https://github.com/facebook/rocksdb/pull/3823
      
      Differential Revision: D7909246
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 89032efbf6b5bd302ea50cbd7a234977984a1fca
      cee138c7
    • A
      Disable readahead when using mmap for reads · 4bf169f0
      Andrew Kryczka 提交于
      Summary:
      `ReadaheadRandomAccessFile` had an unwritten assumption, which was that its wrapped file's `Read()` function always copies into the provided scratch buffer. Actually this was not true when the wrapped file was `PosixMmapReadableFile`, whose `Read()` implementation does no copying and instead returns a `Slice` pointing directly into the  `mmap`'d memory region. This PR:
      
      - prevents `ReadaheadRandomAccessFile` from ever wrapping mmap readable files
      - adds an assert for the assumption `ReadaheadRandomAccessFile` makes about the wrapped file's use of scratch buffer
      Closes https://github.com/facebook/rocksdb/pull/3813
      
      Differential Revision: D7891513
      
      Pulled By: ajkr
      
      fbshipit-source-id: dc64a55222d6af280c39a1852ee39e9e9d7cde7d
      4bf169f0
  9. 08 5月, 2018 4 次提交
    • T
      Link jemalloc · 1d9f24dc
      Tongliang Liao 提交于
      Summary:
      Fix undefined reference to `malloc_*` linking errors on Linux.
      Closes https://github.com/facebook/rocksdb/pull/3817
      
      Differential Revision: D7899066
      
      Pulled By: ajkr
      
      fbshipit-source-id: 18c46569a59608388d6240f1b8ec20c2d2557dec
      1d9f24dc
    • T
      Allows other cmake-specific "true" for USE_RTTI. · 9470ee45
      Tongliang Liao 提交于
      Summary:
      People also use ON/OFF, TRUE/FALSE and other switch options that is allowed by cmake.
      Closes https://github.com/facebook/rocksdb/pull/3814
      
      Differential Revision: D7899032
      
      Pulled By: ajkr
      
      fbshipit-source-id: b71511af59e0a78eedafb639b5002c47050bf3c2
      9470ee45
    • T
      Search paths provided by intel's "tbbvars.sh". · 6d6e01cd
      Tongliang Liao 提交于
      Summary:
      TBBROOT and LIBRARY_PATH are set in env by the script.
      
      With TBB 2018 the library path is $TBBROOT/lib/intel64/gcc4.7 for anything above gcc 4.7, which is both compiler and architecture related. We cannot simply do ${TBB_ROOT_DIR}/lib.
      Closes https://github.com/facebook/rocksdb/pull/3815
      
      Differential Revision: D7899006
      
      Pulled By: ajkr
      
      fbshipit-source-id: 159ab1f6a5c40452ed6aa8d79300206953d916c2
      6d6e01cd
    • M
      Split FaultInjectionTest.FaultTest to avoid timeout · d72a51e9
      Maysam Yabandeh 提交于
      Summary:
      tsan flavor of this test occasionally times out in our test infra. The patch split the test to two, each working on half of the option range.
      Before:
      [       OK ] FaultTest/FaultInjectionTest.FaultTest/0 (5918 ms)
      [       OK ] FaultTest/FaultInjectionTest.FaultTest/1 (5336 ms)
      After:
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/0 (2930 ms)
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/1 (2676 ms)
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/2 (2759 ms)
      [       OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/3 (2546 ms)
      Closes https://github.com/facebook/rocksdb/pull/3819
      
      Differential Revision: D7894975
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 809f1411cbcc27f8aa71a6b29a16b039f51b67c9
      d72a51e9
  10. 05 5月, 2018 6 次提交
    • daheiantian's avatar
      Recommit "Avoid adding tombstones of the same file to RangeDelAggregator multiple times" · 72942ad7
      daheiantian 提交于
      Summary:
      The origin commit #3635  will hurt performance for users who aren't using range deletions, because unneeded std::set operations, so it was reverted by commit 44653c7b. (see #3672)
      
      To fix this, move the set to  and add a check in , i.e., file will be added only if  is non-nullptr.
      
      The db_bench command which find the performance regression:
      > ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 > --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 > --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 > -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none
      
      Before and after the modification, I re-run this command on the machine, the results of are as follows:
      
        **fillrandom**
       Table | P50 | P75 | P99 | P99.9 | P99.99 |
        ---- | --- | --- | --- | ----- | ------ |
       before commit | 5.92 | 8.57 | 19.63 | 980.97 | 12196.00 |
       after commit  | 5.91 | 8.55 | 19.34 | 965.56 | 13513.56 |
      
       **seekrandomwhilewriting**
        Table | P50 | P75 | P99 | P99.9 | P99.99 |
         ---- | --- | --- | --- | ----- | ------ |
       before commit | 1418.62 | 1867.01 | 3823.28 | 4980.99 | 9240.00 |
       after commit  | 1450.54 | 1880.61 | 3962.87 | 5429.60 | 7542.86 |
      Closes https://github.com/facebook/rocksdb/pull/3800
      
      Differential Revision: D7874245
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2e8bec781b3f7399246babd66395c88619534a17
      72942ad7
    • A
      Fix db_stress memory leak ASAN error · 4c5a3232
      Andrew Kryczka 提交于
      Summary:
      In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed.
      Closes https://github.com/facebook/rocksdb/pull/3804
      
      Differential Revision: D7874694
      
      Pulled By: ajkr
      
      fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b
      4c5a3232
    • M
      Evenly split HarnessTest.Randomized · fc522bdb
      Maysam Yabandeh 提交于
      Summary:
      Currently HarnessTest.Randomized is already split but some of the splits are faster than the others. The reason is that each split takes a continuous range of the generated args and the test with later args takes longer to finish. The patch evenly split the args among splits in a round robin fashion.
      Before:
      ```
      [       OK ] HarnessTest.Randomized1n2 (2278 ms)
      [       OK ] HarnessTest.Randomized3n4 (1095 ms)
      [       OK ] HarnessTest.Randomized5 (658 ms)
      [       OK ] HarnessTest.Randomized6 (1258 ms)
      [       OK ] HarnessTest.Randomized7 (6476 ms)
      [       OK ] HarnessTest.Randomized8 (8182 ms)
      ```
      After
      ```
      [       OK ] HarnessTest.Randomized1 (2649 ms)
      [       OK ] HarnessTest.Randomized2 (2645 ms)
      [       OK ] HarnessTest.Randomized3 (2577 ms)
      [       OK ] HarnessTest.Randomized4 (2490 ms)
      [       OK ] HarnessTest.Randomized5 (2553 ms)
      [       OK ] HarnessTest.Randomized6 (2560 ms)
      [       OK ] HarnessTest.Randomized7 (2501 ms)
      [       OK ] HarnessTest.Randomized8 (2574 ms)
      ```
      Closes https://github.com/facebook/rocksdb/pull/3808
      
      Differential Revision: D7882663
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 09b749a9684b6d7d65466aa4b00c5334a49e833e
      fc522bdb
    • M
      Rename vars to satisfy unity built · 171f415b
      Maysam Yabandeh 提交于
      Summary:
      Tested by "make unity_test"
      Closes https://github.com/facebook/rocksdb/pull/3807
      
      Differential Revision: D7882657
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 84862c18d7f2fc762bd96ad070eaeb6936e45159
      171f415b
    • F
      Add USE_RTTI and default behavior to CMakeLists · 4d40b10e
      Fosco Marotto 提交于
      Summary:
      Proposed fix for #3701
      Closes https://github.com/facebook/rocksdb/pull/3801
      
      Differential Revision: D7868264
      
      Pulled By: gfosco
      
      fbshipit-source-id: 013963ed3d172c8dc2abd1dd5982580082ca5d2d
      4d40b10e
    • A
      Fix crash test allocation error under TSAN · 6fc1bcce
      Andrew Kryczka 提交于
      Summary:
      We were seeing the following error: "ThreadSanitizer: DenseSlabAllocator overflow. Dying."
      
      It is fixable by mmap'ing a smaller region for keys' expected values, which this PR achieves by reducing the number of keys.
      Closes https://github.com/facebook/rocksdb/pull/3803
      
      Differential Revision: D7874478
      
      Pulled By: ajkr
      
      fbshipit-source-id: 433939f5cb92410ab4777d540cb0cc2ee0fe6c2e
      6fc1bcce
  11. 04 5月, 2018 3 次提交
    • Z
      MaxFileSizeForLevel: adjust max_file_size for dynamic level compaction · a7034328
      Zhongyi Xie 提交于
      Summary:
      `MutableCFOptions::RefreshDerivedOptions` always assume base level is L1, which is not true when `level_compaction_dynamic_level_bytes=true` and Level based compaction is used.
      This PR fixes this by recomputing `max_file_size` at query time (in `MaxFileSizeForLevel`)
      Fixes https://github.com/facebook/rocksdb/issues/3229
      
      In master:
      
      ```
      Level Files Size(MB)
      --------------------
        0       14      846
        1        0        0
        2        0        0
        3        0        0
        4        0        0
        5       15      366
        6       11      481
      Cumulative compaction: 3.83 GB write, 2.27 GB read
      ```
      In branch:
      ```
      Level Files Size(MB)
      --------------------
        0        9      544
        1        0        0
        2        0        0
        3        0        0
        4        0        0
        5        0        0
        6      445      935
      Cumulative compaction: 2.91 GB write, 1.46 GB read
      ```
      
      db_bench command used:
      ```
      ./db_bench --benchmarks="fillrandom,deleterandom,fillrandom,levelstats,stats" --statistics -deletes=5000 -db=tmp -compression_type=none --num=20000 -value_size=100000 -level_compaction_dynamic_level_bytes=true -target_file_size_base=2097152 -target_file_size_multiplier=2
      ```
      Closes https://github.com/facebook/rocksdb/pull/3755
      
      Differential Revision: D7721381
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 39afb8503190bac3b466adf9bbf2a9b3655789f8
      a7034328
    • D
      Better destroydb · 934f96de
      Dmitri Smirnov 提交于
      Summary:
      Delete archive directory before WAL folder
        since archive may be contained as a subfolder.
        Also improve loop readability.
      Closes https://github.com/facebook/rocksdb/pull/3797
      
      Differential Revision: D7866378
      
      Pulled By: riversand963
      
      fbshipit-source-id: 0c45d97677ce6fbefa3f8d602ef5e2a2a925e6f5
      934f96de
    • M
      Speedup ManualCompactionTest.Test · a8d77ca3
      Maysam Yabandeh 提交于
      Summary:
      ManualCompactionTest.Test occasionally times out in tsan flavor of our test infra. The patch reduces the number of keys to make the test run faster. The change does not seem to negatively impact the coverage of the test.
      Closes https://github.com/facebook/rocksdb/pull/3802
      
      Differential Revision: D7865596
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b4f60e32c3ae1677e25506f71c766e33fa985785
      a8d77ca3