1. 08 11月, 2018 1 次提交
  2. 07 11月, 2018 1 次提交
    • S
      Black list some valgrind tests (#4642) · 566fc8b9
      Siying Dong 提交于
      Summary:
      valgrind tests with 1 thread run too long. To make it shorter, black list some long tests. These are already blacklisted in parallel valgrind tests, but they are not in non-parallel mode
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4642
      
      Differential Revision: D12945237
      
      Pulled By: siying
      
      fbshipit-source-id: 04cf977d435996480fe87aa09f14b17975b74f7d
      566fc8b9
  3. 06 11月, 2018 1 次提交
    • A
      Add DB property for SST files kept from deletion (#4618) · fffac43c
      Andrew Kryczka 提交于
      Summary:
      This property can help debug why SST files aren't being deleted. Previously we only had the property "rocksdb.is-file-deletions-enabled". However, even when that returned true, obsolete SSTs may still not be deleted due to the coarse-grained mechanism we use to prevent newly created SSTs from being accidentally deleted. That coarse-grained mechanism uses a lower bound file number for SSTs that should not be deleted, and this property exposes that lower bound.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4618
      
      Differential Revision: D12898179
      
      Pulled By: ajkr
      
      fbshipit-source-id: fe68acc041ddbcc9276bbd48976524d95aafc776
      fffac43c
  4. 03 11月, 2018 2 次提交
    • S
      Try to fix ExternalSSTFileTest.IngestNonExistingFile flakines (#4625) · c3105aa5
      Siying Dong 提交于
      Summary:
      ExternalSSTFileTest.IngestNonExistingFile occasionally fail for number of SST files after manual compaction doesn't go down as expected. Although I don't find a reason how this can happen, adding an extra waiting to make sure obsolete file purging has finished before we check the files doesn't hurt.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4625
      
      Differential Revision: D12910586
      
      Pulled By: siying
      
      fbshipit-source-id: 2a5ddec6908c99cf3bcc78431c6f93151c2cab59
      c3105aa5
    • Z
      exclude get db property calls from rocksdb_lite (#4619) · 61311157
      Zhongyi Xie 提交于
      Summary:
      fix current failing lite test:
      > In file included from ./util/testharness.h:15:0,
                       from ./table/mock_table.h:23,
                       from ./db/db_test_util.h:44,
                       from db/db_flush_test.cc:10:
      db/db_flush_test.cc: In member function ‘virtual void rocksdb::DBFlushTest_ManualFlushFailsInReadOnlyMode_Test::TestBody()’:
      db/db_flush_test.cc:250:35: error: ‘Properties’ is not a member of ‘rocksdb::DB’
         ASSERT_TRUE(db_->GetIntProperty(DB::Properties::kBackgroundErrors,
                                         ^
      make: *** [db/db_flush_test.o] Error 1
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4619
      
      Differential Revision: D12898319
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 72de603b1f2e972fc8caa88611798c4e98e348c6
      61311157
  5. 02 11月, 2018 3 次提交
  6. 01 11月, 2018 1 次提交
    • A
      Prevent manual compaction hanging in read-only mode (#4611) · b8f68bac
      Andrew Kryczka 提交于
      Summary:
      A background compaction with pre-picked files (i.e., either a manual compaction or a bottom-pri compaction) fails when the DB is in read-only mode. In the failure handling, we forgot to unregister the compaction and the files it covered. Then subsequent manual compactions could conflict with this zombie compaction (possibly Halloween related) and wait forever for it to finish.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4611
      
      Differential Revision: D12871217
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9d24e921d5bbd2ee8c2c9536a30abfa42a220c6e
      b8f68bac
  7. 31 10月, 2018 3 次提交
    • Y
      Add test to check if DB can handle atomic group (#4433) · d1118f6f
      Yanqin Jin 提交于
      Summary:
      Add unit tests to demonstrate that `VersionSet::Recover` is able to detect and handle cases in which the MANIFEST has valid atomic group, incomplete trailing atomic group, atomic group mixed with normal version edits and atomic group with incorrect size.
      With this capability, RocksDB identifies non-valid groups of version edits and do not apply them, thus guaranteeing that the db is restored to a state consistent with the most recent successful atomic flush before applying WAL.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4433
      
      Differential Revision: D10079202
      
      Pulled By: riversand963
      
      fbshipit-source-id: a0e0b8bf4da1cf68e044d397588c121b66c68876
      d1118f6f
    • A
      Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594) · eaaf1a6f
      Abhishek Madan 提交于
      Summary:
      Since the number of range deletions are reported in
      TableProperties, it is confusing to not report the number of merge
      operands and point deletions as top-level properties; they are
      accessible through the public API, but since they are not the "main"
      properties, they do not appear in aggregated table properties, or the
      string representation of table properties.
      
      This change promotes those two property keys to
      `rocksdb/table_properties.h`, adds corresponding uint64 members for
      them, deprecates the old access methods `GetDeletedKeys()` and
      `GetMergeOperands()` (though they are still usable for now), and removes
      `InternalKeyPropertiesCollector`. The property key strings are the same
      as before this change, so this should be able to read DBs written from older
      versions (though I haven't tested this yet).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594
      
      Differential Revision: D12826893
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
      eaaf1a6f
    • S
      Remove info logging in db mutex inside EnableFileDeletions() (#4604) · 9da88a83
      Siying Dong 提交于
      Summary:
      EnableFileDeletions() does info logging inside db mutex. This is not recommended in the code base, since there could be I/O involved. Move this outside the DB mutex.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4604
      
      Differential Revision: D12834432
      
      Pulled By: siying
      
      fbshipit-source-id: ffe5c2626fcfdb4c54a661a3c3b0bc95054816cf
      9da88a83
  8. 30 10月, 2018 3 次提交
    • A
      Fix range tombstones written to more files than necessary (#4592) · cae540eb
      Andrew Kryczka 提交于
      Summary:
      When there's a gap between files, we do not need to output tombstones starting at the next output file's begin key to the current output file.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4592
      
      Differential Revision: D12808627
      
      Pulled By: ajkr
      
      fbshipit-source-id: 77c8b2e7523a95b1cd6611194144092c06acb505
      cae540eb
    • Y
      Disable DBIOFailureTest.NoSpaceCompactRange in LITE (#4596) · 806ff34b
      Yanqin Jin 提交于
      Summary:
      Since ErrorHandler::RecoverFromNoSpace is no-op in LITE mode, then we should
      not have this test in LITE mode. If we do keep it, it will cause the test
      thread to wait on bg_cv_ that will not be signalled.
      
      How to reproduce
      ```
      $make clean && git checkout a27fce40
      $OPT="-DROCKSDB_LITE -g" make -j20
      $./db_io_failure_test --gtest_filter=DBIOFailureTest.NoSpaceCompactRange
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4596
      
      Differential Revision: D12818516
      
      Pulled By: riversand963
      
      fbshipit-source-id: bc83524f40fff1e29506979017f7f4c2b70322f3
      806ff34b
    • Y
      Avoid memtable cut when active memtable is empty (#4595) · 92b44015
      Yanqin Jin 提交于
      Summary:
      For flush triggered by RocksDB due to memory usage approaching certain
      threshold (WriteBufferManager or Memtable full), we should cut the memtable
      only when the current active memtable is not empty, i.e. contains data. This is
      what we do for non-atomic flush. If we always cut memtable even when the active
      memtable is empty, we will generate extra, empty immutable memtable.
      This is not ideal since it may cause write stall. It also causes some
      DBAtomicFlushTest to fail because cfd->imm()->NumNotFlushed() is different from
      expectation.
      
      Test plan
      ```
      $make clean && make J=1 -j32 all check
      $make clean && OPT="-DROCKSDB_LITE -g" make J=1 -j32 all check
      $make clean && TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make J=1 -j32 valgrind_test
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4595
      
      Differential Revision: D12818520
      
      Pulled By: riversand963
      
      fbshipit-source-id: d867bdbeacf4199fdd642debb085f94703c41a18
      92b44015
  9. 27 10月, 2018 2 次提交
  10. 26 10月, 2018 1 次提交
    • A
      Cache fragmented range tombstones in BlockBasedTableReader (#4493) · 7528130e
      Abhishek Madan 提交于
      Summary:
      This allows tombstone fragmenting to only be performed when the table is opened, and cached for subsequent accesses.
      
      On the same DB used in #4449, running `readrandom` results in the following:
      ```
      readrandom   :       0.983 micros/op 1017076 ops/sec;   78.3 MB/s (63103 of 100000 found)
      ```
      
      Now that Get performance in the presence of range tombstones is reasonable, I also compared the performance between a DB with range tombstones, "expanded" range tombstones (several point tombstones that cover the same keys the equivalent range tombstone would cover, a common workaround for DeleteRange), and no range tombstones. The created DBs had 5 million keys each, and DeleteRange was called at regular intervals (depending on the total number of range tombstones being written) after 4.5 million Puts. The table below summarizes the results of a `readwhilewriting` benchmark (in order to provide somewhat more realistic results):
      ```
         Tombstones?    | avg micros/op | stddev micros/op |  avg ops/s   | stddev ops/s
      ----------------- | ------------- | ---------------- | ------------ | ------------
      None              |        0.6186 |          0.04637 | 1,625,252.90 | 124,679.41
      500 Expanded      |        0.6019 |          0.03628 | 1,666,670.40 | 101,142.65
      500 Unexpanded    |        0.6435 |          0.03994 | 1,559,979.40 | 104,090.52
      1k Expanded       |        0.6034 |          0.04349 | 1,665,128.10 | 125,144.57
      1k Unexpanded     |        0.6261 |          0.03093 | 1,600,457.50 |  79,024.94
      5k Expanded       |        0.6163 |          0.05926 | 1,636,668.80 | 154,888.85
      5k Unexpanded     |        0.6402 |          0.04002 | 1,567,804.70 | 100,965.55
      10k Expanded      |        0.6036 |          0.05105 | 1,667,237.70 | 142,830.36
      10k Unexpanded    |        0.6128 |          0.02598 | 1,634,633.40 |  72,161.82
      25k Expanded      |        0.6198 |          0.04542 | 1,620,980.50 | 116,662.93
      25k Unexpanded    |        0.5478 |          0.0362  | 1,833,059.10 | 121,233.81
      50k Expanded      |        0.5104 |          0.04347 | 1,973,107.90 | 184,073.49
      50k Unexpanded    |        0.4528 |          0.03387 | 2,219,034.50 | 170,984.32
      ```
      
      After a large enough quantity of range tombstones are written, range tombstone Gets can become faster than reading from an equivalent DB with several point tombstones.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4493
      
      Differential Revision: D10842844
      
      Pulled By: abhimadan
      
      fbshipit-source-id: a7d44534f8120e6aabb65779d26c6b9df954c509
      7528130e
  11. 25 10月, 2018 4 次提交
    • Z
      Fix two contrun job failures (#4587) · fe0d2305
      Zhongyi Xie 提交于
      Summary:
      Currently there are two contrun test failures:
      * rocksdb-contrun-lite:
      > tools/db_bench_tool.cc: In function ‘int rocksdb::db_bench_tool(int, char**)’:
      tools/db_bench_tool.cc:5814:5: error: ‘DumpMallocStats’ is not a member of ‘rocksdb’
           rocksdb::DumpMallocStats(&stats_string);
           ^
      make: *** [tools/db_bench_tool.o] Error 1
      * rocksdb-contrun-unity:
      > In file included from unity.cc:44:0:
      db/range_tombstone_fragmenter.cc: In member function ‘void rocksdb::FragmentedRangeTombstoneIterator::FragmentTombstones(std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice> >, rocksdb::SequenceNumber)’:
      db/range_tombstone_fragmenter.cc:90:14: error: reference to ‘ParsedInternalKeyComparator’ is ambiguous
         auto cmp = ParsedInternalKeyComparator(icmp_);
      
      This PR will fix them
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4587
      
      Differential Revision: D10846554
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 8d3358879e105060197b1379c84aecf51b352b93
      fe0d2305
    • Y
      Remove unused variable · eb8c9918
      Yanqin Jin 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4585
      
      Differential Revision: D10841983
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6a7e0b40065bcfbb10a2cac0cec1e8da0750a617
      eb8c9918
    • A
      Use only "local" range tombstones during Get (#4449) · 8c78348c
      Abhishek Madan 提交于
      Summary:
      Previously, range tombstones were accumulated from every level, which
      was necessary if a range tombstone in a higher level covered a key in a lower
      level. However, RangeDelAggregator::AddTombstones's complexity is based on
      the number of tombstones that are currently stored in it, which is wasteful in
      the Get case, where we only need to know the highest sequence number of range
      tombstones that cover the key from higher levels, and compute the highest covering
      sequence number at the current level. This change introduces this optimization, and
      removes the use of RangeDelAggregator from the Get path.
      
      In the benchmark results, the following command was used to initialize the database:
      ```
      ./db_bench -db=/dev/shm/5k-rts -use_existing_db=false -benchmarks=filluniquerandom -write_buffer_size=1048576 -compression_type=lz4 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -value_size=112 -key_size=16 -block_size=4096 -level_compaction_dynamic_level_bytes=true -num=5000000 -max_background_jobs=12 -benchmark_write_rate_limit=20971520 -range_tombstone_width=100 -writes_per_range_tombstone=100 -max_num_range_tombstones=50000 -bloom_bits=8
      ```
      
      ...and the following command was used to measure read throughput:
      ```
      ./db_bench -db=/dev/shm/5k-rts/ -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=5000000 -reads=100000 -threads=32
      ```
      
      The filluniquerandom command was only run once, and the resulting database was used
      to measure read performance before and after the PR. Both binaries were compiled with
      `DEBUG_LEVEL=0`.
      
      Readrandom results before PR:
      ```
      readrandom   :       4.544 micros/op 220090 ops/sec;   16.9 MB/s (63103 of 100000 found)
      ```
      
      Readrandom results after PR:
      ```
      readrandom   :      11.147 micros/op 89707 ops/sec;    6.9 MB/s (63103 of 100000 found)
      ```
      
      So it's actually slower right now, but this PR paves the way for future optimizations (see #4493).
      
      ----
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4449
      
      Differential Revision: D10370575
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9a2e152be1ef36969055c0e9eb4beb0d96c11f4d
      8c78348c
    • Z
      use per-level perf context for bloom filter related counters (#4581) · 21bf7421
      Zhongyi Xie 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/pull/4226 introduced per-level perf context which allows breaking down perf context by levels.
      This PR takes advantage of the feature to populate a few counters related to bloom filters
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4581
      
      Differential Revision: D10518010
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 011244561783ec860d32d5b0fa6bce6e78d70ef8
      21bf7421
  12. 24 10月, 2018 2 次提交
    • N
      Adapt three unit tests with newer compiler/libraries (#4562) · 43dbd441
      Neil Mayhew 提交于
      Summary:
      This fixes three tests that fail with relatively recent tools and libraries:
      
      The tests are:
      
      * `spatial_db_test`
      * `table_test`
      * `db_universal_compaction_test`
      
      I'm using:
      
      * `gcc` 7.3.0
      * `glibc` 2.27
      * `snappy` 1.1.7
      * `gflags` 2.2.1
      * `zlib` 1.2.11
      * `bzip2` 1.0.6.0.1
      * `lz4` 1.8.2
      * `jemalloc` 5.0.1
      
      The versions used in the Travis environment (which is two Ubuntu LTS versions behind the current one and doesn't use `lz4` or `jemalloc`) don't seem to have a problem. However, to be safe, I verified that these tests pass with and without my changes in a trusty Docker container without `lz4` and `jemalloc`.
      
      However, I do get an unrelated set of other failures when using a trusty Docker container that uses `lz4` and `jemalloc`:
      
      ```
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/0, where GetParam() = (1, false) (1189 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/1
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/1, where GetParam() = (1, true) (1246 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/2
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/2, where GetParam() = (3, false) (1237 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/3
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/3, where GetParam() = (3, true) (1195 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/4
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/4, where GetParam() = (5, false) (1161 ms)
      [ RUN      ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/5
      db/db_universal_compaction_test.cc:506: Failure
      Value of: num + 1
        Actual: 3
      Expected: NumSortedRuns(1)
      Which is: 4
      [  FAILED  ] UniversalCompactionNumLevels/DBTestUniversalCompaction.DynamicUniversalCompactionReadAmplification/5, where GetParam() = (5, true) (1229 ms)
      ```
      
      I haven't attempted to fix these since I'm not using trusty and Travis doesn't use `lz4` and `jemalloc`. However, the final commit in this PR does at least fix the compilation errors that occur when using trusty's version of `lz4`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4562
      
      Differential Revision: D10510917
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 59534042015ec339270e5fc2f6ac4d859370d189
      43dbd441
    • Z
      fix clang analyzer error (#4583) · f6b151f1
      Zhongyi Xie 提交于
      Summary:
      clang analyzer currently fails with the following warnings:
      > db/log_reader.cc:323:9: warning: Undefined or garbage value returned to caller
              return r;
              ^~~~~~~~
      db/log_reader.cc:344:11: warning: Undefined or garbage value returned to caller
                return r;
                ^~~~~~~~
      db/log_reader.cc:369:11: warning: Undefined or garbage value returned to caller
                return r;
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4583
      
      Differential Revision: D10523517
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 0cc8b8f27657b202bead148bbe7c4aa84fed095b
      f6b151f1
  13. 23 10月, 2018 2 次提交
    • M
      Fix user comparator receiving internal key (#4575) · c34cc404
      Maysam Yabandeh 提交于
      Summary:
      There was a bug that the user comparator would receive the internal key instead of the user key. The bug was due to RangeMightExistAfterSortedRun expecting user key but receiving internal key when called in GenerateBottommostFiles. The patch augment an existing unit test to reproduce the bug and fixes it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4575
      
      Differential Revision: D10500434
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 858346d2fd102cce9e20516d77338c112bdfe366
      c34cc404
    • S
      Dynamic level to adjust level multiplier when write is too heavy (#4338) · 70242636
      Siying Dong 提交于
      Summary:
      Level compaction usually performs poorly when the writes so heavy that the level targets can't be guaranteed. With this improvement, we improve level_compaction_dynamic_level_bytes = true so that in the write heavy cases, the level multiplier can be slightly adjusted based on the size of L0.
      
      We keep the behavior the same if number of L0 files is under 2X compaction trigger and the total size is less than options.max_bytes_for_level_base, so that unless write is so heavy that compaction cannot keep up, the behavior doesn't change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4338
      
      Differential Revision: D9636782
      
      Pulled By: siying
      
      fbshipit-source-id: e27fc17a7c29c84b00064cc17536a01dacef7595
      70242636
  14. 22 10月, 2018 1 次提交
    • Y
      Fix RepeatableThreadTest::MockEnvTest hang (#4560) · 933250e3
      Yi Wu 提交于
      Summary:
      When `MockTimeEnv` is used in test to mock time methods, we cannot use `CondVar::TimedWait` because it is using real time, not the mocked time for wait timeout. On Mac the method can return immediately without awaking other waiting threads, if the real time is larger than `wait_until` (which is a mocked time). When that happen, the `wait()` method will fall into an infinite loop.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4560
      
      Differential Revision: D10472851
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 898902546ace7db7ac509337dd8677a527209d19
      933250e3
  15. 20 10月, 2018 1 次提交
    • Y
      Add read retry support to log reader (#4394) · da4aa59b
      Yanqin Jin 提交于
      Summary:
      Current `log::Reader` does not perform retry after encountering `EOF`. In the future, we need the log reader to be able to retry tailing the log even after `EOF`.
      
      Current implementation is simple. It does not provide more advanced retry policies. Will address this in the future.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4394
      
      Differential Revision: D9926508
      
      Pulled By: riversand963
      
      fbshipit-source-id: d86d145792a41bd64a72f642a2a08c7b7b5201e1
      da4aa59b
  16. 19 10月, 2018 1 次提交
  17. 18 10月, 2018 2 次提交
    • A
      Lazily initialize RangeDelAggregator stripe map entries (#4497) · 45f213b5
      Abhishek Madan 提交于
      Summary:
      When there are no range deletions, flush and compaction perform a binary search
      on an effectively empty map every time they call ShouldDelete. This PR lazily
      initializes each stripe map entry so that the binary search can be elided in
      these cases.
      
      After this PR, the total amount of time spent in compactions is 52.541331s, and the total amount of time spent in flush is 5.532608s, the former of which is a significant improvement from the results after #4495.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4497
      
      Differential Revision: D10428610
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 6f7e1ce3698fac3ef86d1197955e6b72e0931a0f
      45f213b5
    • Z
      Add PerfContextByLevel to provide per level perf context information (#4226) · d6ec2887
      Zhongyi Xie 提交于
      Summary:
      Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level.
      This will be helpful when analyzing point and range query performance as well as tuning bloom filter
      Also replaced __thread with thread_local keyword for perf_context
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226
      
      Differential Revision: D10369509
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82
      d6ec2887
  18. 16 10月, 2018 3 次提交
    • A
      Properly determine a truncated CompactRange stop key (#4496) · 1e384580
      anand1976 提交于
      Summary:
      When a CompactRange() call for a level is truncated before the end key
      is reached, because it exceeds max_compaction_bytes, we need to properly
      set the compaction_end parameter to indicate the stop key. The next
      CompactRange will use that as the begin key. We set it to the smallest
      key of the next file in the level after expanding inputs to get a clean
      cut.
      
      Previously, we were setting it before expanding inputs. So we could end
      up recompacting some files. In a pathological case, where a single key
      has many entries spanning all the files in the level (possibly due to
      merge operands without a partial merge operator, thus resulting in
      compaction output identical to the input), this would result in
      an endless loop over the same set of files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4496
      
      Differential Revision: D10395026
      
      Pulled By: anand1976
      
      fbshipit-source-id: f0c2f89fee29b4b3be53b6467b53abba8e9146a9
      1e384580
    • Y
      Add support to flush multiple CFs atomically (#4262) · e633983c
      Yanqin Jin 提交于
      Summary:
      Leverage existing `FlushJob` to implement atomic flush of multiple column families.
      
      This PR depends on other PRs and is a subset of #3752 . This PR itself is not sufficient in fulfilling atomic flush.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4262
      
      Differential Revision: D9283109
      
      Pulled By: riversand963
      
      fbshipit-source-id: 65401f913e4160b0a61c0be6cd02adc15dad28ed
      e633983c
    • A
      Avoid per-key linear scan over snapshots in compaction (#4495) · 32b4d4ad
      Andrew Kryczka 提交于
      Summary:
      `CompactionIterator::snapshots_` is ordered by ascending seqnum, just like `DBImpl`'s linked list of snapshots from which it was copied. This PR exploits this ordering to make `findEarliestVisibleSnapshot` do binary search rather than linear scan. This can make flush/compaction significantly faster when many snapshots exist since that function is called on every single key.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4495
      
      Differential Revision: D10386470
      
      Pulled By: ajkr
      
      fbshipit-source-id: 29734991631227b6b7b677e156ac567690118a8b
      32b4d4ad
  19. 13 10月, 2018 3 次提交
    • Y
      Add listener to sample file io (#3933) · 729a617b
      Yanqin Jin 提交于
      Summary:
      We would like to collect file-system-level statistics including file name, offset, length, return code, latency, etc., which requires to add callbacks to intercept file IO function calls when RocksDB is running.
      To collect file-system-level statistics, users can inherit the class `EventListener`, as in `TestFileOperationListener `. Note that `TestFileOperationListener::ShouldBeNotifiedOnFileIO()` returns true.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3933
      
      Differential Revision: D10219571
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7acc577a2d31097766a27adb6f78eaf8b1e8ff15
      729a617b
    • Y
      Fix compile error with jemalloc (#4488) · 6f8d4bdf
      Yi Wu 提交于
      Summary:
      The "je_" prefix of jemalloc APIs presents only when the macro `JEMALLOC_NO_RENAME` from jemalloc.h presents.
      
      With the patch I'm also adding -DROCKSDB_JEMALLOC flag in buck TARGETS.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4488
      
      Differential Revision: D10355971
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 03a2d69790a44ac89219c7525763fa937a63d95a
      6f8d4bdf
    • C
      Acquire lock on DB LOCK file before starting repair. (#4435) · 6422356a
      Chinmay Kamat 提交于
      Summary:
      This commit adds code to acquire lock on the DB LOCK file
      before starting the repair process. This will prevent
      multiple processes from performing repair on the same DB
      simultaneously. Fixes repair_test to work with this change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4435
      
      Differential Revision: D10361499
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3c512c48b7193d383b2279ccecabdb660ac1cf22
      6422356a
  20. 12 10月, 2018 1 次提交
    • A
      Use vector in UncollapsedRangeDelMap (#4487) · 7dd16410
      Abhishek Madan 提交于
      Summary:
      Using `./range_del_aggregator_bench --use_collapsed=false
      --num_range_tombstones=5000 --num_runs=1000`, here are the results before and
      after this change:
      
      Before:
      ```
      =========================
      Results:
      =========================
      AddTombstones:           1822.61 us
      ShouldDelete (first):    94.5286 us
      ```
      
      After:
      ```
      =========================
      Results:
      =========================
      AddTombstones:           199.26 us
      ShouldDelete (first):    38.9344 us
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4487
      
      Differential Revision: D10347288
      
      Pulled By: abhimadan
      
      fbshipit-source-id: d44efe3a166d583acfdc3ec1199e0892f34dbfb7
      7dd16410
  21. 11 10月, 2018 2 次提交