1. 05 2月, 2020 1 次提交
  2. 04 2月, 2020 6 次提交
    • M
      Add an option to prevent DB::Open() from querying sizes of all sst files (#6353) · 637e64b9
      Mike Kolupaev 提交于
      Summary:
      When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open.
      
      If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes.
      
      We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353
      
      Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble.
      
      Differential Revision: D19656425
      
      Pulled By: al13n321
      
      fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82
      637e64b9
    • A
      Fix a test failure in error_handler_test (#6367) · 7330ec0f
      anand76 提交于
      Summary:
      Fix an intermittent failure in
      DBErrorHandlingTest.CompactionManifestWriteError due to a race between
      background error recovery and the main test thread calling
      TEST_WaitForCompact().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6367
      
      Test Plan: Run the test using gtest_parallel
      
      Differential Revision: D19713802
      
      Pulled By: anand1976
      
      fbshipit-source-id: 29e35dc26e0984fe8334c083e059f4fa1f335d68
      7330ec0f
    • S
      Use ReadFileToString() to get content from IDENTITY file (#6365) · f195d8d5
      sdong 提交于
      Summary:
      Right now when reading IDENTITY file, we use a very similar logic as ReadFileToString() while it does an extra file size check, which may be expensive in some file systems. There is no reason to duplicate the logic. Use ReadFileToString() instead.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6365
      
      Test Plan: RUn all existing tests.
      
      Differential Revision: D19709399
      
      fbshipit-source-id: 3bac31f3b2471f98a0d2694278b41e9cd34040fe
      f195d8d5
    • S
      Avoid create directory for every column families (#6358) · 36c504be
      sdong 提交于
      Summary:
      A relatively recent regression causes for every CF, create and open directory is called for the DB directory, unless CF has a private directory. This doesn't scale well with large number of column families.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6358
      
      Test Plan: Run all existing tests and see it pass. strace with db_bench --num_column_families and observe it doesn't open directory for number of column families.
      
      Differential Revision: D19675141
      
      fbshipit-source-id: da01d9216f1dae3f03d4064fbd88ce71245bd9be
      36c504be
    • H
      Error handler test fix (#6266) · eb4d6af5
      Huisheng Liu 提交于
      Summary:
      MultiDBCompactionError fails when it verifies the number of files on level 0 and level 1 without waiting for compaction to finish.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6266
      
      Differential Revision: D19701639
      
      Pulled By: riversand963
      
      fbshipit-source-id: e96d511bcde705075f073e0b550cebcd2ecfccdc
      eb4d6af5
    • A
      Improve RocksJava Comparator (#6252) · 7242dae7
      Adam Retter 提交于
      Summary:
      This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.
      
      **NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.
      
      Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).
      
      In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.
      
      In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.
      
       ---
      [JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.
      
      With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425).
      
      These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.
      
       ---
      These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.
      
      ```
      ComparatorBenchmarks.put                                                native_bytewise  thrpt   25  124483.795 ± 2032.443  ops/s
      ComparatorBenchmarks.put                                        native_reverse_bytewise  thrpt   25  114414.536 ± 3486.156  ops/s
      ComparatorBenchmarks.put              java_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   17228.250 ± 1288.546  ops/s
      ComparatorBenchmarks.put          java_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   16035.865 ± 1248.099  ops/s
      ComparatorBenchmarks.put                java_bytewise_non-direct_reused-64_thread-local  thrpt   25   21571.500 ±  871.521  ops/s
      ComparatorBenchmarks.put                  java_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   23613.773 ± 8465.660  ops/s
      ComparatorBenchmarks.put              java_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   16768.172 ± 5618.489  ops/s
      ComparatorBenchmarks.put                    java_bytewise_direct_reused-64_thread-local  thrpt   25   23921.164 ± 8734.742  ops/s
      ComparatorBenchmarks.put                              java_bytewise_non-direct_no-reuse  thrpt   25   17899.684 ±  839.679  ops/s
      ComparatorBenchmarks.put                                  java_bytewise_direct_no-reuse  thrpt   25   22148.316 ± 1215.527  ops/s
      ComparatorBenchmarks.put      java_reverse_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   11311.126 ±  820.602  ops/s
      ComparatorBenchmarks.put  java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   11421.311 ±  807.210  ops/s
      ComparatorBenchmarks.put        java_reverse_bytewise_non-direct_reused-64_thread-local  thrpt   25   11554.005 ±  960.556  ops/s
      ComparatorBenchmarks.put          java_reverse_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   22960.523 ± 1673.421  ops/s
      ComparatorBenchmarks.put      java_reverse_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   18293.317 ± 1434.601  ops/s
      ComparatorBenchmarks.put            java_reverse_bytewise_direct_reused-64_thread-local  thrpt   25   24479.361 ± 2157.306  ops/s
      ComparatorBenchmarks.put                      java_reverse_bytewise_non-direct_no-reuse  thrpt   25    7942.286 ±  626.170  ops/s
      ComparatorBenchmarks.put                          java_reverse_bytewise_direct_no-reuse  thrpt   25   11781.955 ± 1019.843  ops/s
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252
      
      Differential Revision: D19331064
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
      7242dae7
  3. 01 2月, 2020 4 次提交
  4. 31 1月, 2020 5 次提交
    • M
      Disable recycle_log_file_num when it is incompatible with recovery mode (#6351) · 3316d292
      Maysam Yabandeh 提交于
      Summary:
      Non-zero recycle_log_file_num is incompatible with kPointInTimeRecovery and kAbsoluteConsistency recovery modes. Currently SanitizeOptions changes the recovery mode to kTolerateCorruptedTailRecords, while to resolve this option conflict it makes more sense to compromise recycle_log_file_num, which is a performance feature, instead of wal_recovery_mode, which is a safety feature.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6351
      
      Differential Revision: D19648931
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: dd0bf78349edc007518a00c4d63931fd69294ad7
      3316d292
    • Y
      Shorten certain test names to avoid infra failure (#6352) · f2fbc5d6
      Yanqin Jin 提交于
      Summary:
      Unit test names, together with other components,  are used to create log files
      during some internal testing. Overly long names cause infra failure due to file
      names being too long.
      
      Look for internal tests.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6352
      
      Differential Revision: D19649307
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6f29de096e33c0eaa87d9c8702f810eda50059e7
      f2fbc5d6
    • B
      fix build warnnings on MSVC (#6309) · c9a5e487
      Burton Li 提交于
      Summary:
      Fix build warnings on MSVC. siying
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6309
      
      Differential Revision: D19455012
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 940739f2c92de60e47cc2bed8dd7f921459545a9
      c9a5e487
    • P
      Don't download from (unreliable) maven.org (#6348) · 90c71aa5
      Peter Dillinger 提交于
      Summary:
      I set up a mirror of our Java deps on github so we can download
      them through github URLs rather than maven.org, which is proving
      terribly unreliable from Travis builds.
      
      Also sanitized calls to curl, so they are easier to read and
      appropriately fail on download failure.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348
      
      Test Plan: CI
      
      Differential Revision: D19633621
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3
      90c71aa5
    • A
      Force a new manifest file if append to current one fails (#6331) · fb05b5a6
      anand76 提交于
      Summary:
      Fix for issue https://github.com/facebook/rocksdb/issues/6316
      
      When an append/sync of the manifest file fails due to an IO error such
      as NoSpace, we don't always put the DB in read-only mode. This is true
      for flush and compactions, as well as foreground operatons such as column family
      add/drop, CompactFiles etc. Subsequent changes to the DB will be
      recorded in the same manifest file, which would have a corrupted record
      in the middle due to the previous failure. On next DB::Open(), it will
      fail to process the full manifest and data will be lost.
      
      To fix this, we reset VersionSet::descriptor_log_ on append/sync
      failure, which will force a new manifest file to be written on the next
      append.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6331
      
      Test Plan: Add new unit tests in error_handler_test.cc
      
      Differential Revision: D19632951
      
      Pulled By: anand1976
      
      fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
      fb05b5a6
  5. 30 1月, 2020 6 次提交
    • L
      Add statistics for BlobDB GC (#6296) · 9e3ace42
      Levi Tamasi 提交于
      Summary:
      The patch adds statistics support to the new BlobDB garbage collection implementation;
      namely, it adds support for the following (pre-existing) tickers:
      
      `BLOB_DB_GC_NUM_FILES`: the number of blob files obsoleted by the GC logic.
      `BLOB_DB_GC_NUM_NEW_FILES`: the number of new blob files generated by the GC logic.
      `BLOB_DB_GC_FAILURES`: the number of failed GC passes (where a GC pass is
      equivalent to a (sub)compaction).
      `BLOB_DB_GC_NUM_KEYS_RELOCATED`: the number of blobs relocated to new blob
      files by the GC logic.
      `BLOB_DB_GC_BYTES_RELOCATED`: the total size of blobs relocated to new blob files.
      
      The tickers `BLOB_DB_GC_NUM_KEYS_OVERWRITTEN`, `BLOB_DB_GC_NUM_KEYS_EXPIRED`,
      `BLOB_DB_GC_BYTES_OVERWRITTEN`, `BLOB_DB_GC_BYTES_EXPIRED`, and
      `BLOB_DB_GC_MICROS` are not relevant for the new GC logic, and are thus marked
      deprecated.
      
      The patch also adds a couple of log messages that log the number and total size of
      blobs encountered and relocated during a GC pass, as well as the number of blob
      files created and obsoleted.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6296
      
      Test Plan: Extended unit tests and used the BlobDB mode of `db_bench`.
      
      Differential Revision: D19402513
      
      Pulled By: ltamasi
      
      fbshipit-source-id: d53d2bfbf4928a1db1e9346c67ebb9007b8932ec
      9e3ace42
    • S
      Fix LITE build with DBTest2.AutoPrefixMode1 (#6346) · 71874c5a
      sdong 提交于
      Summary:
      DBTest2.AutoPrefixMode1 doesn't pass because auto prefix mode is not supported there.
      Fix it by disabling the test.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6346
      
      Test Plan: Run DBTest2.AutoPrefixMode1 in lite mode
      
      Differential Revision: D19627486
      
      fbshipit-source-id: fbde75260aeecb7e6fc406e09c19a71a95aa5f08
      71874c5a
    • P
      Upload DB dir for all crash tests (#6344) · 23dcf275
      Peter Dillinger 提交于
      Summary:
      Difficult to root cause crash test failures without archiving
      db dir. Now all crash test configurations should save the db dir.
      
      Also exit with error code on bad command.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6344
      
      Test Plan:
      Hmm, how about this:
      
          for TARGET in stress_crash asan_crash ubsan_crash tsan_crash; do EMAIL=email ONCALL=oncall TRIGGER=all SUBSCRIBER=sub build_tools/rocksdb-lego-determinator $TARGET > tmp && node -c tmp && grep -q Upload tmp || echo Bad; done
      
      Differential Revision: D19625605
      
      Pulled By: pdillinger
      
      fbshipit-source-id: cb84aa93ee80b4534f4c61b90f0e0f99a41155d5
      23dcf275
    • S
      Fix db_bloom_filter_test clang LITE build (#6340) · 02ac6c9a
      sdong 提交于
      Summary:
      db_bloom_filter_test break with clang LITE build with following message:
      
      db/db_bloom_filter_test.cc:23:29: error: unused variable 'kPlainTable' [-Werror,-Wunused-const-variable]
      static constexpr PseudoMode kPlainTable = -1;
                                  ^
      
      Fix it by moving the declaration out of LITE build
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6340
      
      Test Plan:
      USE_CLANG=1 LITE=1 make db_bloom_filter_test
      and without LITE=1
      
      Differential Revision: D19609834
      
      fbshipit-source-id: 0e88f5c6759238a94f9880d84c785ac18e7cdd7e
      02ac6c9a
    • M
      Double Crash in kPointInTimeRecovery with TransactionDB (#6313) · 2f973ca9
      Maysam Yabandeh 提交于
      Summary:
      In WritePrepared there could be gap in sequence numbers. This breaks the trick we use in kPointInTimeRecovery which assume the first seq in the log right after the corrupted log is one larger than the last seq we read from the logs. To let this trick keep working, we add a dummy entry with the expected sequence to the first log right after recovery.
      Also in WriteCommitted, if the log right after the corrupted log is empty, since it has no sequence number to let the sequential trick work, it is assumed as unexpected behavior. This is however expected to happen if we close the db after recovering from a corruption and before writing anything new to it. To remedy that, we apply the same technique by writing a dummy entry to the log that is created after the corrupted log.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6313
      
      Differential Revision: D19458291
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 09bc49e574690085df45b034ca863ff315937e2d
      2f973ca9
    • A
      Reduce the need to re-download dependencies (#6318) · a07a9dc9
      Adam Retter 提交于
      Summary:
      Both changes are related to RocksJava:
      
      1. Allow dependencies that are already present on the host system due to Maven to be reused in Docker builds.
      
      2. Extend the `make clean-not-downloaded` target to RocksJava, so that libraries needed as dependencies for the test suite are not deleted and re-downloaded unnecessarily.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6318
      
      Differential Revision: D19608742
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 25e25649e3e3212b537ac4512b40e2e53dc02ae7
      a07a9dc9
  6. 29 1月, 2020 2 次提交
    • S
      Add ReadOptions.auto_prefix_mode (#6314) · 8f2bee67
      sdong 提交于
      Summary:
      Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314
      
      Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run.
      
      Differential Revision: D19458717
      
      fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce
      8f2bee67
    • S
      Add Google Group to Issue Template · 431fb6c0
      Siying Dong 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6339
      
      Differential Revision: D19608457
      
      fbshipit-source-id: 2adea28b1bd20b85ccafca1aa567030115220ea6
      431fb6c0
  7. 28 1月, 2020 5 次提交
    • S
      Use the same oldest ancestor time in table properties and manifest · 4f6c8622
      Sagar Vemuri 提交于
      Summary:
      ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions passed 96 / 100 times.
      ```
      With the fix: all runs (tried 100, 1000, 10000) succeed.
      ```
      $ TEST_TMPDIR=/dev/shm ~/gtest-parallel/gtest-parallel ./db_compaction_test --gtest_filter=DBCompactionTest.LevelTtlCascadingCompactions --repeat=1000
      [1000/1000] DBCompactionTest.LevelTtlCascadingCompactions (1895 ms)
      ```
      
      Test Plan:
      Build:
      ```
      COMPILE_WITH_TSAN=1 make db_compaction_test -j100
      ```
      Without the fix: a few runs out of 100 fail:
      ```
      $ TEST_TMPDIR=/dev/shm KEEP_DB=1 ~/gtest-parallel/gtest-parallel ./db_compaction_test --gtest_filter=DBCompactionTest.LevelTtlCascadingCompactions --repeat=100
      ...
      ...
      Note: Google Test filter = DBCompactionTest.LevelTtlCascadingCompactions
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from DBCompactionTest
      [ RUN      ] DBCompactionTest.LevelTtlCascadingCompactions
      db/db_compaction_test.cc:3687: Failure
      Expected equality of these values:
        oldest_time
          Which is: 1580155869
        level_to_files[6][0].oldest_ancester_time
          Which is: 1580155870
      DB is still at /dev/shm//db_compaction_test_6337001442947696266
      [  FAILED  ] DBCompactionTest.LevelTtlCascadingCompactions (1432 ms)
      [----------] 1 test from DBCompactionTest (1432 ms total)
      
      [----------] Global test environment tear-down
      [==========] 1 test from 1 test case ran. (1433 ms total)
      [  PASSED  ] 0 tests.
      [  FAILED  ] 1 test, listed below:
      [  FAILED  ] DBCompactionTest.LevelTtlCascadingCompactions
      
       1 FAILED TEST
      [80/100] DBCompactionTest.LevelTtlCascadingCompactions returned/aborted with exit code 1 (1489 ms)
      [100/100] DBCompactionTest.LevelTtlCascadingCompactions (1522 ms)
      FAILED TESTS (4/100):
          1419 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/90)
          1434 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/84)
          1457 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/82)
          1489 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/74)
      
      Differential Revision: D19587040
      
      Pulled By: sagar0
      
      fbshipit-source-id: 11191ae9940837643bff47ebe18b299b4be3d950
      4f6c8622
    • S
      Move HISTORY.md entry of hash index fix from 6.7 to unreleased (#6337) · 7aa66c70
      sdong 提交于
      Summary:
      Commits related to hash index fix have been reverted in 6.7.fb branch. Update HISTORY.md to keep it in sync.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6337
      
      Differential Revision: D19593717
      
      fbshipit-source-id: 466178dc6205c9e41ccced41bf281a0952bdc2ca
      7aa66c70
    • A
      fix `WriteBufferManager` flush log message (#6335) · 5b33cfa1
      Andrew Kryczka 提交于
      Summary:
      It chooses the oldest memtable, not the largest one. This is an
      important difference for users whose CFs receive non-uniform write
      rates.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6335
      
      Differential Revision: D19588865
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 62ad4325b0182f5f27858584cd73fd5978fb2cec
      5b33cfa1
    • S
      Fix regression bug of hash index with iterator total order seek (#6328) · f10f1359
      sdong 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/6028 introduces a bug for hash index in SST files. If a table reader is created when total order seek is used, prefix_extractor might be passed into table reader as null. While later when prefix seek is used, the same table reader used, hash index is checked but prefix extractor is null and the program would crash.
      Fix the issue by fixing http://github.com/facebook/rocksdb/pull/6028 in the way that prefix_extractor is preserved but ReadOptions.total_order_seek is checked
      
      Also, a null pointer check is added so that a bug like this won't cause segfault in the future.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6328
      
      Test Plan: Add a unit test that would fail without the fix. Stress test that reproduces the crash would pass.
      
      Differential Revision: D19586751
      
      fbshipit-source-id: 8de77690167ddf5a77a01e167cf89430b1bfba42
      f10f1359
    • P
      Clean up PartitionedFilterBlockBuilder (#6299) · 986df371
      Peter Dillinger 提交于
      Summary:
      Remove the redundant PartitionedFilterBlockBuilder::num_added_ and ::NumAdded since the parent class, FullFilterBlockBuilder, already provides them.
      Also rename filters_in_partition_ and filters_per_partition_ to keys_added_to_partition_ and keys_per_partition_ to improve readability.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6299
      
      Test Plan: make check
      
      Differential Revision: D19413278
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 04926ee7874477d659cb2b6ae03f2d995fb747e5
      986df371
  8. 25 1月, 2020 2 次提交
    • F
      Update version for next release, 6.7.0 (#6320) · bd698e4f
      Fosco Marotto 提交于
      Summary:
      Adjusted history for 6.6.1 and 6.6.2, switched master version to 6.7.0.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6320
      
      Differential Revision: D19499272
      
      Pulled By: gfosco
      
      fbshipit-source-id: 2bafb2456951f231e411e9c03aaa4c044f497684
      bd698e4f
    • M
      Implement PinnableSlice::remove_prefix (#6330) · c4bc30e1
      Maysam Yabandeh 提交于
      Summary:
      The function was left unimplemented. Although we currently don't have a use for that it was declared with an assert(0) to prevent mistakenly using the remove_prefix of the parent class. The function body  with only assert(0) however causes issues with some compiler's warning levels. The patch implements the function to avoid the warning.
      It also piggybacks some minor code warning for unnecessary semicolons after the function definition.s
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6330
      
      Differential Revision: D19559062
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 3a022484f688c9abd4556e5412bcc2628ab96a00
      c4bc30e1
  9. 24 1月, 2020 3 次提交
    • L
      Fix the "records dropped" statistics (#6325) · f34782a6
      Levi Tamasi 提交于
      Summary:
      The earlier code used two conflicting definitions for the number of
      input records going into a compaction, one based on the
      `rocksdb.num.entries` table property and one based on
      `CompactionIterationStats`. The first one is correct and in line
      with how output records are counted, while the second one incorrectly
      ignores input records in various cases when the `CompactionIterator`
      advances or reseeks the input iterator (this can happen, amongst other
      cases, when dealing with `SingleDelete`s, regular `Delete`s, `Merge`s,
      and compaction filters). This can result in the code undercounting the
      input records and computing an incorrect value for "records dropped"
      during the compaction. The patch fixes this by switching over to the
      correct (table property based) input record count for "records dropped".
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6325
      
      Test Plan: Tested using `make check` and `db_bench`.
      
      Differential Revision: D19525491
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 4340b0b2f41546db8e356db70ca02199e48fa636
      f34782a6
    • A
      Fix queue manipulation in WriteThread::BeginWriteStall() (#6322) · 0672a6db
      anand76 提交于
      Summary:
      When there is a write stall, the active write group leader calls ```BeginWriteStall()``` to walk the queue of writers and remove any with the ```no_slowdown``` option set. There was a bug in the code which updated the back pointer but not the forward pointer (```link_newer```), corrupting the list and causing some threads to wait forever. This PR fixes it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6322
      
      Test Plan: Add a unit test in db_write_test
      
      Differential Revision: D19538313
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6fbed819e594913f435886606f5d36f74f235c3a
      0672a6db
    • M
      Revert "crash_test to enable block-based table hash index (#6310)" (#6327) · 967a2d95
      Maysam Yabandeh 提交于
      Summary:
      This reverts commit 8e309b35.
      The stress tests are failing . Revert it until we figure the root cause.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6327
      
      Differential Revision: D19537657
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bf34a5dd720825957729e136e9a5a729a240e61a
      967a2d95
  10. 23 1月, 2020 1 次提交
  11. 22 1月, 2020 3 次提交
    • M
      Correct pragma once problem with Bazel on Windows (#6321) · e6e8b9e8
      matthewvon 提交于
      Summary:
      This is a simple edit to have two #include file paths be consistent within range_del_aggregator.{h,cc} with everywhere else.
      
      The impact of this inconsistency is that it actual breaks a Bazel based build on the Windows platform. The same pragma once failure occurs with both Windows Visual C++ 2019 and clang for Windows 9.0. Bazel's "sandboxing" of the builds causes both compilers to not properly recognize "rocksdb/types.h" and "include/rocksdb/types.h" to be the same file (also comparator.h). My guess is that the backslash versus forward slash mixing within path names is the underlying issue.
      
      But, everything builds fine once the include paths in these two source files are consistent with the rest of the repository.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6321
      
      Differential Revision: D19506585
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 294c346607edc433ab99eaabc9c880ee7426817a
      e6e8b9e8
    • L
      Make DBCompactionTest.SkipStatsUpdateTest more robust (#6306) · d305f13e
      Levi Tamasi 提交于
      Summary:
      Currently, this test case tries to infer whether
      `VersionStorageInfo::UpdateAccumulatedStats` was called during open by
      checking the number of files opened against an arbitrary threshold (10).
      This makes the test brittle and results in sporadic failures. The patch
      changes the test case to use sync points to directly test whether
      `UpdateAccumulatedStats` was called.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6306
      
      Test Plan: `make check`
      
      Differential Revision: D19439544
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ceb7adf578222636a0f51740872d0278cd1a914f
      d305f13e
    • S
      crash_test to enable block-based table hash index (#6310) · 8e309b35
      sdong 提交于
      Summary:
      Block-based table has index has been disabled in crash test due to bugs. We fixed a bug and re-enable it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6310
      
      Test Plan: Finish one round of "crash_test_with_atomic_flush" test successfully while exclusively running has index. Another run also ran for several hours without failure.
      
      Differential Revision: D19455856
      
      fbshipit-source-id: 1192752d2c1e81ed7e5c5c7a9481c841582d5274
      8e309b35
  12. 21 1月, 2020 1 次提交
    • P
      Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317) · 8aa99fc7
      Peter Dillinger 提交于
      Summary:
      With many millions of keys, the old Bloom filter implementation
      for the block-based table (format_version <= 4) would have excessive FP
      rate due to the limitations of feeding the Bloom filter with a 32-bit hash.
      This change computes an estimated inflated FP rate due to this effect
      and warns in the log whenever an SST filter is constructed (almost
      certainly a "full" not "partitioned" filter) that exceeds 1.5x FP rate
      due to this effect. The detailed condition is only checked if 3 million
      keys or more have been added to a filter, as this should be a lower
      bound for common bits/key settings (< 20).
      
      Recommended remedies include smaller SST file size, using
      format_version >= 5 (for new Bloom filter), or using partitioned
      filters.
      
      This does not change behavior other than generating warnings for some
      constructed filters using the old implementation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6317
      
      Test Plan:
      Example with warning, 15M keys @ 15 bits / key: (working_mem_size_mb is just to stop after building one filter if it's large)
      
          $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=15000000 2>&1 | grep 'FP rate'
          [WARN] [/block_based/filter_policy.cc:292] Using legacy SST/BBT Bloom filter with excessive key count (15.0M @ 15bpk), causing estimated 1.8x higher filter FP rate. Consider using new Bloom with format_version>=5, smaller SST file size, or partitioned filters.
          Predicted FP rate %: 0.766702
          Average FP rate %: 0.66846
      
      Example without warning (150K keys):
      
          $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=150000 2>&1 | grep 'FP rate'
          Predicted FP rate %: 0.422857
          Average FP rate %: 0.379301
          $
      
      With more samples at 15 bits/key:
        150K keys -> no warning; actual: 0.379% FP rate (baseline)
        1M keys -> no warning; actual: 0.396% FP rate, 1.045x
        9M keys -> no warning; actual: 0.563% FP rate, 1.485x
        10M keys -> warning (1.5x); actual: 0.564% FP rate, 1.488x
        15M keys -> warning (1.8x); actual: 0.668% FP rate, 1.76x
        25M keys -> warning (2.4x); actual: 0.880% FP rate, 2.32x
      
      At 10 bits/key:
        150K keys -> no warning; actual: 1.17% FP rate (baseline)
        1M keys -> no warning; actual: 1.16% FP rate
        10M keys -> no warning; actual: 1.32% FP rate, 1.13x
        25M keys -> no warning; actual: 1.63% FP rate, 1.39x
        35M keys -> warning (1.6x); actual: 1.81% FP rate, 1.55x
      
      At 5 bits/key:
        150K keys -> no warning; actual: 9.32% FP rate (baseline)
        25M keys -> no warning; actual: 9.62% FP rate, 1.03x
        200M keys -> no warning; actual: 12.2% FP rate, 1.31x
        250M keys -> warning (1.5x); actual: 12.8% FP rate, 1.37x
        300M keys -> warning (1.6x); actual: 13.4% FP rate, 1.43x
      
      The reason for the modest inaccuracy at low bits/key is that the assumption of independence between a collision between 32-hash values feeding the filter and an FP in the filter is not quite true for implementations using "simple" logic to compute indices from the stock hash result. There's math on this in my dissertation, but I don't think it's worth the effort just for these extreme cases (> 100 million keys and low-ish bits/key).
      
      Differential Revision: D19471715
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f80c96893a09bf1152630ff0b964e5cdd7e35c68
      8aa99fc7
  13. 18 1月, 2020 1 次提交
    • P
      Log warning for high bits/key in legacy Bloom filter (#6312) · 4b86fe11
      Peter Dillinger 提交于
      Summary:
      Help users that would benefit most from new Bloom filter
      implementation by logging a warning that recommends the using
      format_version >= 5.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6312
      
      Test Plan:
      $ (for BPK in 10 13 14 19 20 50; do ./filter_bench -quick -impl=0 -bits_per_key=$BPK -m_queries=1 2>&1; done) | grep 'its/key'
          Bits/key actual: 10.0647
          Bits/key actual: 13.0593
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (14) bits/key. Significant filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 14.0581
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (19) bits/key. Significant filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 19.0542
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (20) bits/key. Dramatic filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 20.0584
          [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (50) bits/key. Dramatic filter space and/or accuracy improvement is available with format_verion>=5.
          Bits/key actual: 50.0577
      
      Differential Revision: D19457191
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 073d94cde5c70e03a160f953e1100c15ea83eda4
      4b86fe11