1. 28 1月, 2020 4 次提交
  2. 25 1月, 2020 2 次提交
    • F
      Update version for next release, 6.7.0 (#6320) · bd698e4f
      Fosco Marotto 提交于
      Summary:
      Adjusted history for 6.6.1 and 6.6.2, switched master version to 6.7.0.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6320
      
      Differential Revision: D19499272
      
      Pulled By: gfosco
      
      fbshipit-source-id: 2bafb2456951f231e411e9c03aaa4c044f497684
      bd698e4f
    • M
      Implement PinnableSlice::remove_prefix (#6330) · c4bc30e1
      Maysam Yabandeh 提交于
      Summary:
      The function was left unimplemented. Although we currently don't have a use for that it was declared with an assert(0) to prevent mistakenly using the remove_prefix of the parent class. The function body  with only assert(0) however causes issues with some compiler's warning levels. The patch implements the function to avoid the warning.
      It also piggybacks some minor code warning for unnecessary semicolons after the function definition.s
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6330
      
      Differential Revision: D19559062
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 3a022484f688c9abd4556e5412bcc2628ab96a00
      c4bc30e1
  3. 24 1月, 2020 3 次提交
    • L
      Fix the "records dropped" statistics (#6325) · f34782a6
      Levi Tamasi 提交于
      Summary:
      The earlier code used two conflicting definitions for the number of
      input records going into a compaction, one based on the
      `rocksdb.num.entries` table property and one based on
      `CompactionIterationStats`. The first one is correct and in line
      with how output records are counted, while the second one incorrectly
      ignores input records in various cases when the `CompactionIterator`
      advances or reseeks the input iterator (this can happen, amongst other
      cases, when dealing with `SingleDelete`s, regular `Delete`s, `Merge`s,
      and compaction filters). This can result in the code undercounting the
      input records and computing an incorrect value for "records dropped"
      during the compaction. The patch fixes this by switching over to the
      correct (table property based) input record count for "records dropped".
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6325
      
      Test Plan: Tested using `make check` and `db_bench`.
      
      Differential Revision: D19525491
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 4340b0b2f41546db8e356db70ca02199e48fa636
      f34782a6
    • A
      Fix queue manipulation in WriteThread::BeginWriteStall() (#6322) · 0672a6db
      anand76 提交于
      Summary:
      When there is a write stall, the active write group leader calls ```BeginWriteStall()``` to walk the queue of writers and remove any with the ```no_slowdown``` option set. There was a bug in the code which updated the back pointer but not the forward pointer (```link_newer```), corrupting the list and causing some threads to wait forever. This PR fixes it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6322
      
      Test Plan: Add a unit test in db_write_test
      
      Differential Revision: D19538313
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6fbed819e594913f435886606f5d36f74f235c3a
      0672a6db
    • M
      Revert "crash_test to enable block-based table hash index (#6310)" (#6327) · 967a2d95
      Maysam Yabandeh 提交于
      Summary:
      This reverts commit 8e309b35.
      The stress tests are failing . Revert it until we figure the root cause.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6327
      
      Differential Revision: D19537657
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bf34a5dd720825957729e136e9a5a729a240e61a
      967a2d95
  4. 23 1月, 2020 1 次提交
  5. 22 1月, 2020 3 次提交
    • M
      Correct pragma once problem with Bazel on Windows (#6321) · e6e8b9e8
      matthewvon 提交于
      Summary:
      This is a simple edit to have two #include file paths be consistent within range_del_aggregator.{h,cc} with everywhere else.
      
      The impact of this inconsistency is that it actual breaks a Bazel based build on the Windows platform. The same pragma once failure occurs with both Windows Visual C++ 2019 and clang for Windows 9.0. Bazel's "sandboxing" of the builds causes both compilers to not properly recognize "rocksdb/types.h" and "include/rocksdb/types.h" to be the same file (also comparator.h). My guess is that the backslash versus forward slash mixing within path names is the underlying issue.
      
      But, everything builds fine once the include paths in these two source files are consistent with the rest of the repository.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6321
      
      Differential Revision: D19506585
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 294c346607edc433ab99eaabc9c880ee7426817a
      e6e8b9e8
    • L
      Make DBCompactionTest.SkipStatsUpdateTest more robust (#6306) · d305f13e
      Levi Tamasi 提交于
      Summary:
      Currently, this test case tries to infer whether
      `VersionStorageInfo::UpdateAccumulatedStats` was called during open by
      checking the number of files opened against an arbitrary threshold (10).
      This makes the test brittle and results in sporadic failures. The patch
      changes the test case to use sync points to directly test whether
      `UpdateAccumulatedStats` was called.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6306
      
      Test Plan: `make check`
      
      Differential Revision: D19439544
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ceb7adf578222636a0f51740872d0278cd1a914f
      d305f13e
    • S
      crash_test to enable block-based table hash index (#6310) · 8e309b35
      sdong 提交于
      Summary:
      Block-based table has index has been disabled in crash test due to bugs. We fixed a bug and re-enable it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6310
      
      Test Plan: Finish one round of "crash_test_with_atomic_flush" test successfully while exclusively running has index. Another run also ran for several hours without failure.
      
      Differential Revision: D19455856
      
      fbshipit-source-id: 1192752d2c1e81ed7e5c5c7a9481c841582d5274
      8e309b35
  6. 21 1月, 2020 1 次提交
    • P
      Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317) · 8aa99fc7
      Peter Dillinger 提交于
      Summary:
      With many millions of keys, the old Bloom filter implementation
      for the block-based table (format_version <= 4) would have excessive FP
      rate due to the limitations of feeding the Bloom filter with a 32-bit hash.
      This change computes an estimated inflated FP rate due to this effect
      and warns in the log whenever an SST filter is constructed (almost
      certainly a "full" not "partitioned" filter) that exceeds 1.5x FP rate
      due to this effect. The detailed condition is only checked if 3 million
      keys or more have been added to a filter, as this should be a lower
      bound for common bits/key settings (< 20).
      
      Recommended remedies include smaller SST file size, using
      format_version >= 5 (for new Bloom filter), or using partitioned
      filters.
      
      This does not change behavior other than generating warnings for some
      constructed filters using the old implementation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6317
      
      Test Plan:
      Example with warning, 15M keys @ 15 bits / key: (working_mem_size_mb is just to stop after building one filter if it's large)
      
          $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=15000000 2>&1 | grep 'FP rate'
          [WARN] [/block_based/filter_policy.cc:292] Using legacy SST/BBT Bloom filter with excessive key count (15.0M @ 15bpk), causing estimated 1.8x higher filter FP rate. Consider using new Bloom with format_version>=5, smaller SST file size, or partitioned filters.
          Predicted FP rate %: 0.766702
          Average FP rate %: 0.66846
      
      Example without warning (150K keys):
      
          $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=150000 2>&1 | grep 'FP rate'
          Predicted FP rate %: 0.422857
          Average FP rate %: 0.379301
          $
      
      With more samples at 15 bits/key:
        150K keys -> no warning; actual: 0.379% FP rate (baseline)
        1M keys -> no warning; actual: 0.396% FP rate, 1.045x
        9M keys -> no warning; actual: 0.563% FP rate, 1.485x
        10M keys -> warning (1.5x); actual: 0.564% FP rate, 1.488x
        15M keys -> warning (1.8x); actual: 0.668% FP rate, 1.76x
        25M keys -> warning (2.4x); actual: 0.880% FP rate, 2.32x
      
      At 10 bits/key:
        150K keys -> no warning; actual: 1.17% FP rate (baseline)
        1M keys -> no warning; actual: 1.16% FP rate
        10M keys -> no warning; actual: 1.32% FP rate, 1.13x
        25M keys -> no warning; actual: 1.63% FP rate, 1.39x
        35M keys -> warning (1.6x); actual: 1.81% FP rate, 1.55x
      
      At 5 bits/key:
        150K keys -> no warning; actual: 9.32% FP rate (baseline)
        25M keys -> no warning; actual: 9.62% FP rate, 1.03x
        200M keys -> no warning; actual: 12.2% FP rate, 1.31x
        250M keys -> warning (1.5x); actual: 12.8% FP rate, 1.37x
        300M keys -> warning (1.6x); actual: 13.4% FP rate, 1.43x
      
      The reason for the modest inaccuracy at low bits/key is that the assumption of independence between a collision between 32-hash values feeding the filter and an FP in the filter is not quite true for implementations using "simple" logic to compute indices from the stock hash result. There's math on this in my dissertation, but I don't think it's worth the effort just for these extreme cases (> 100 million keys and low-ish bits/key).
      
      Differential Revision: D19471715
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f80c96893a09bf1152630ff0b964e5cdd7e35c68
      8aa99fc7
  7. 18 1月, 2020 3 次提交
    • P
      Log warning for high bits/key in legacy Bloom filter (#6312) · 4b86fe11
      Peter Dillinger 提交于
      Summary:
      Help users that would benefit most from new Bloom filter
      implementation by logging a warning that recommends the using
      format_version >= 5.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6312
      
      Test Plan:
      $ (for BPK in 10 13 14 19 20 50; do ./filter_bench -quick -impl=0 -bits_per_key=$BPK -m_queries=1 2>&1; done) | grep 'its/key'
          Bits/key actual: 10.0647
          Bits/key actual: 13.0593
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (14) bits/key. Significant filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 14.0581
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (19) bits/key. Significant filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 19.0542
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (20) bits/key. Dramatic filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 20.0584
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (50) bits/key. Dramatic filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 50.0577
      
      Differential Revision: D19457191
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 073d94cde5c70e03a160f953e1100c15ea83eda4
      4b86fe11
    • C
      Separate enable-WAL and disable-WAL writer to avoid unwanted data in log files (#6290) · 931876e8
      chenyou-fdu 提交于
      Summary:
      When we do concurrently writes, and different write operations will have WAL enable or disable.
      But the data from write operation with WAL disabled will still be logged into log files, which will lead to extra disk write/sync since we do not want any guarantee for these part of data.
      
      Detail can be found in https://github.com/facebook/rocksdb/issues/6280. This PR avoid mixing the two types in a write group. The advantage is simpler reasoning about the write group content
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6290
      
      Differential Revision: D19448598
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 3d990a0f79a78ea1bfc90773f6ebafc1884c20de
      931876e8
    • M
      Expose atomic flush option in C API (#6307) · 7e5b04d0
      Matt Bell 提交于
      Summary:
      This PR adds a `rocksdb_options_set_atomic_flush` function to the C API.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6307
      
      Differential Revision: D19451313
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 750495642ef55b1ea7e13477f85c38cd6574849c
      7e5b04d0
  8. 17 1月, 2020 5 次提交
  9. 16 1月, 2020 2 次提交
    • L
      Access Maven Central over HTTPS (#6301) · b7f1b3e5
      Levi Tamasi 提交于
      Summary:
      As of 1/15/2020, Maven Central does not support plain HTTP. Because of
      this, our Travis and AppVeyor builds have started failing during the
      assertj download step. This patch will hopefully fix these issues.
      
      See https://blog.sonatype.com/central-repository-moving-to-https
      for more info.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6301
      
      Test Plan:
      Will monitor the builds. ("I don't always test my changes but when I do,
      I do it in production.")
      
      Differential Revision: D19422923
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 76f9a8564a5b66ddc721d705f9cbfc736bf7a97d
      b7f1b3e5
    • S
      Fix kHashSearch bug with SeekForPrev (#6297) · d2b4d42d
      sdong 提交于
      Summary:
      When prefix is enabled the expected behavior when the prefix of the target does not exist is for Seek is to seek to any key larger than target and SeekToPrev to any key less than the target.
      Currently. the prefix index (kHashSearch) returns OK status but sets Invalid() to indicate two cases: a prefix of the searched key does not exist, ii) the key is beyond the range of the keys in SST file. The SeekForPrev implementation in BlockBasedTable thus does not have enough information to know when it should set the index key to first (to return a key smaller than target). The patch fixes that by returning NotFound status for cases that the prefix does not exist. SeekForPrev in BlockBasedTable accordingly SeekToFirst instead of SeekToLast on the index iterator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6297
      
      Test Plan: SeekForPrev of non-exsiting prefix is added to block_test.cc, and a test case is added in db_test2, which fails without the fix.
      
      Differential Revision: D19404695
      
      fbshipit-source-id: cafbbf95f8f60ff9ede9ccc99d25bfa1cf6fcdc3
      d2b4d42d
  10. 15 1月, 2020 3 次提交
  11. 14 1月, 2020 1 次提交
    • S
      Bug when multiple files at one level contains the same smallest key (#6285) · 894c6d21
      sdong 提交于
      Summary:
      The fractional cascading index is not correctly generated when two files at the same level contains the same smallest or largest user key.
      The result would be that it would hit an assertion in debug mode and lower level files might be skipped.
      This might cause wrong results when the same user keys are of merge operands and Get() is called using the exact user key. In that case, the lower files would need to further checked.
      The fix is to fix the fractional cascading index.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6285
      
      Test Plan: Add a unit test which would cause the assertion which would be fixed.
      
      Differential Revision: D19358426
      
      fbshipit-source-id: 39b2b1558075fd95e99491d462a67f9f2298c48e
      894c6d21
  12. 11 1月, 2020 3 次提交
    • Q
      More const pointers in C API (#6283) · 6733be03
      Qinfan Wu 提交于
      Summary:
      This makes it easier to call the functions from Rust as otherwise they require mutable types.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6283
      
      Differential Revision: D19349991
      
      Pulled By: wqfish
      
      fbshipit-source-id: e8da7a75efe8cd97757baef8ca844a054f2519b4
      6733be03
    • S
      Consider all compaction input files to compute the oldest ancestor time (#6279) · cfa58561
      Sagar Vemuri 提交于
      Summary:
      Look at all compaction input files to compute the oldest ancestor time.
      
      In https://github.com/facebook/rocksdb/issues/5992 we changed how creation_time (aka oldest-ancestor-time) table property of compaction output files is computed from max(creation-time-of-all-compaction-inputs) to min(creation-time-of-all-inputs). This exposed a bug where, during compaction, the creation_time:s of only the L0 compaction inputs were being looked at, and all other input levels were being ignored. This PR fixes the issue.
      Some TTL compactions when using Level-Style compactions might not have run due to this bug.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6279
      
      Test Plan: Enhanced the unit tests to validate that the correct time is propagated to the compaction outputs.
      
      Differential Revision: D19337812
      
      Pulled By: sagar0
      
      fbshipit-source-id: edf8a72f11e405e93032ff5f45590816debe0bb4
      cfa58561
    • M
      unordered_write incompatible with max_successive_merges (#6284) · eff5e076
      Maysam Yabandeh 提交于
      Summary:
      unordered_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions.
      The patch fixes that and also reverts the changes performed by https://github.com/facebook/rocksdb/pull/6254, in which max_successive_merges was mistakenly declared incompatible with unordered_write.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6284
      
      Differential Revision: D19356115
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f06dadec777622bd75f267361c022735cf8cecb6
      eff5e076
  13. 10 1月, 2020 2 次提交
  14. 09 1月, 2020 6 次提交
  15. 08 1月, 2020 1 次提交
    • A
      JMH microbenchmarks for RocksJava (#6241) · 6477075f
      Adam Retter 提交于
      Summary:
      This is the start of some JMH microbenchmarks for RocksJava.
      
      Such benchmarks can help us decide on performance improvements of the Java API.
      
      At the moment, I have only added benchmarks for various Comparator options, as that is one of the first areas where I want to improve performance. I plan to expand this to many more tests.
      
      Details of how to compile and run the benchmarks are in the `README.md`.
      
      A run of these on a XEON 3.5 GHz 4vCPU (QEMU Virtual CPU version 2.5+) / 8GB RAM KVM with Ubuntu 18.04, OpenJDK 1.8.0_232, and gcc 8.3.0 produced the following:
      
      ```
      # Run complete. Total time: 01:43:17
      
      REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
      why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
      experiments, perform baseline and negative tests that provide experimental control, make sure
      the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
      Do not assume the numbers tell you what you want them to tell.
      
      Benchmark                                         (comparatorName)   Mode  Cnt       Score       Error  Units
      ComparatorBenchmarks.put                           native_bytewise thrpt   25   122373.920 ±  2200.538  ops/s
      ComparatorBenchmarks.put              java_bytewise_adaptive_mutex thrpt   25    17388.201 ±  1444.006  ops/s
      ComparatorBenchmarks.put          java_bytewise_non-adaptive_mutex thrpt   25    16887.150 ±  1632.204  ops/s
      ComparatorBenchmarks.put       java_direct_bytewise_adaptive_mutex thrpt   25    15644.572 ±  1791.189  ops/s
      ComparatorBenchmarks.put   java_direct_bytewise_non-adaptive_mutex thrpt   25    14869.601 ±  2252.135  ops/s
      ComparatorBenchmarks.put                   native_reverse_bytewise thrpt   25   116528.735 ±  4168.797  ops/s
      ComparatorBenchmarks.put      java_reverse_bytewise_adaptive_mutex thrpt   25    10651.975 ±   545.998  ops/s
      ComparatorBenchmarks.put  java_reverse_bytewise_non-adaptive_mutex thrpt   25    10514.224 ±   930.069  ops/s
      ```
      
      Indicating a ~7x difference between comparators implemented natively (C++) and those implemented in Java. Let's see if we can't improve on that in the near future...
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6241
      
      Differential Revision: D19290410
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 25d44bf3a31de265502ed0c5d8a28cf4c7cb9c0b
      6477075f