1. 09 5月, 2020 2 次提交
    • S
      Improve ldb consistency checks (#6802) · a50ea71c
      sdong 提交于
      Summary:
      When using ldb, users cannot turn on force consistency check in most commands, while they cannot use checksonsistnecy with --try_load_options. The change fixes both by:
      1. checkconsistency now calls OpenDB() so that it gets all the options loading and sanitized options logic
      2. use options.check_consistency_checks = true by default, and add a --disable_consistency_checks to turn it off.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6802
      
      Test Plan: Add a new unit test. Some manual tests with corrupted DBs.
      
      Reviewed By: pdillinger
      
      Differential Revision: D21388051
      
      fbshipit-source-id: 8d122732d391b426e3982a1c3232a8e3763ffad0
      a50ea71c
    • Y
      Fix a few bugs in best-efforts recovery (#6824) · e72e2167
      Yanqin Jin 提交于
      Summary:
      1. Update column_family_memtables_ to point to latest column_family_set in
         version_set after recovery.
      2. Normalize file paths passed by application so that directories end with '/'
         or '\\'.
      3. In addition to missing files, corrupted files are also ignored in
         best-efforts recovery.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6824
      
      Test Plan: COMPILE_WITH_ASAN=1 make check
      
      Reviewed By: anand1976
      
      Differential Revision: D21463905
      
      Pulled By: riversand963
      
      fbshipit-source-id: c48db8843cc93c8c1c7139c474b64e6f775307d2
      e72e2167
  2. 08 5月, 2020 4 次提交
    • A
      Fix race due to delete triggered compaction in Universal compaction mode (#6799) · 94265234
      anand76 提交于
      Summary:
      Delete triggered compaction in universal compaction mode was causing a corruption when scheduled in parallel with other compactions.
      1. When num_levels = 1, a file marked for compaction may be picked along with all older files in L0, without checking if any of them are already being compaction. This can cause unpredictable results like resurrection of older versions of keys or deleted keys.
      2. When num_levels > 1, a delete triggered compaction would not get scheduled if it overlaps with a running regular compaction. However, the reverse is not true. This is due to the fact that in ```UniversalCompactionBuilder::CalculateSortedRuns```, it assumes that entire sorted runs are picked for compaction and only checks the first file in a sorted run to determine conflicts. This is violated by a delete triggered compaction as it works on a subset of a sorted run.
      
      Fix the bug for num_levels > 1, and disable the feature for now when num_levels = 1. After disabling this feature, files would still get marked for compaction, but no compaction would get scheduled.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6799
      
      Reviewed By: pdillinger
      
      Differential Revision: D21431286
      
      Pulled By: anand1976
      
      fbshipit-source-id: ae9f0bdb1d6ae2f10284847db731c23f43af164a
      94265234
    • A
      Fixup HISTORY.md for e9ba4ba3 "validate range tombstone covers positiv… (#6825) · 3730b05d
      Andrew Kryczka 提交于
      Summary:
      …e range"
      
      Moved it from the wrong section (6.10) to the right section (Unreleased).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6825
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21464577
      
      Pulled By: ajkr
      
      fbshipit-source-id: a836b4ab10be2464182826f9411c9c424c933b70
      3730b05d
    • P
      Fix false NotFound from batched MultiGet with kHashSearch (#6821) · b27a1448
      Peter Dillinger 提交于
      Summary:
      The error is assigning KeyContext::s to NotFound status in a
      table reader for a "not found in this table" case, which skips searching
      in later tables, like only a delete should. (The hash search index iterator
      is the only one that can return status NotFound even if Valid() == false.)
      
      This was detected by intermittent failure in
      MultiThreadedDBTest.MultiThreaded/5, a kHashSearch configuration.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6821
      
      Test Plan: modified existing unit test to reproduce problem
      
      Reviewed By: anand1976
      
      Differential Revision: D21450469
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7478003684d637dbd491cdac81468041a791be2c
      b27a1448
    • A
      validate range tombstone covers positive range (#6788) · e9ba4ba3
      Andrew Kryczka 提交于
      Summary:
      We found some files containing nothing but negative range tombstones,
      and unsurprisingly their metadata specified a negative range, which made
      things crash. Time to add a bit of user input validation.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6788
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21343719
      
      Pulled By: ajkr
      
      fbshipit-source-id: f1c16e4c3e9fa150958c8c866176632a3206fb74
      e9ba4ba3
  3. 07 5月, 2020 2 次提交
  4. 05 5月, 2020 2 次提交
    • Y
      Fix db_stress when GetLiveFiles() flushes dropped CF (#6805) · 5a61e786
      Yanqin Jin 提交于
      Summary:
      Current impl. of db_stress will abort verification and report failure if
      GetLiveFiles() causes a dropped column family to be flushed. This is not
      desired.
      To fix, this PR makes the following change:
      In GetLiveFiles, if flush is triggered and returns
      Status::IsColumnFamilyDropped(), then set status to Status::OK().
      This is OK because dropped column families will be skipped during the rest of
      this function, and valid column families will have their live files returned to
      caller.
      
      Test plan (dev server):
      make check
      ./db_stress -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100
      ./db_stress -disable_wal=1 -reopen=0 -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6805
      
      Reviewed By: ltamasi
      
      Differential Revision: D21390044
      
      Pulled By: riversand963
      
      fbshipit-source-id: de67846b95a4f1b88aa0a30c3d70c43cc68625b9
      5a61e786
    • S
      Avoid Swallowing Some File Consistency Checking Bugs (#6793) · 680c4163
      sdong 提交于
      Summary:
      We are swallowing some file consistency checking failures. This is not expected. We are fixing two cases: DB reopen and manifest dump.
      More places are not fixed and need follow-up.
      
      Error from CheckConsistencyForDeletes() is also swallowed, which is not fixed in this PR.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6793
      
      Test Plan: Add a unit test to cover the reopen case.
      
      Reviewed By: riversand963
      
      Differential Revision: D21366525
      
      fbshipit-source-id: eb438a322237814e8d5125f916a3c6de97f39ded
      680c4163
  5. 01 5月, 2020 1 次提交
  6. 29 4月, 2020 2 次提交
    • P
      Basic MultiGet support for partitioned filters (#6757) · bae6f586
      Peter Dillinger 提交于
      Summary:
      In MultiGet, access each applicable filter partition only once
      per batch, rather than for each applicable key. Also,
      
      * Fix Bloom stats for MultiGet
      * Fix/refactor MultiGetContext::Range::KeysLeft, including
      * Add efficient BitsSetToOne implementation
      * Assert that MultiGetContext::Range does not go beyond shift range
      
      Performance test: Generate db:
      
          $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 -partition_index_and_filters=true
          ...
      
      Before (middle performing run of three; note some missing Bloom stats):
      
          $ ./db_bench --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 | egrep 'micros/op|block.cache.filter.hit|bloom.filter.(full|use)|number.multiget'
          multireadrandom :      26.403 micros/op 597517 ops/sec; (548427 of 671968 found)
          rocksdb.block.cache.filter.hit COUNT : 83443275
          rocksdb.bloom.filter.useful COUNT : 0
          rocksdb.bloom.filter.full.positive COUNT : 0
          rocksdb.bloom.filter.full.true.positive COUNT : 7931450
          rocksdb.number.multiget.get COUNT : 385984
          rocksdb.number.multiget.keys.read COUNT : 12351488
          rocksdb.number.multiget.bytes.read COUNT : 793145000
          rocksdb.number.multiget.keys.found COUNT : 7931450
      
      After (middle performing run of three):
      
          $ ./db_bench_new --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 | egrep 'micros/op|block.cache.filter.hit|bloom.filter.(full|use)|number.multiget'
          multireadrandom :      21.024 micros/op 752963 ops/sec; (705188 of 863968 found)
          rocksdb.block.cache.filter.hit COUNT : 49856682
          rocksdb.bloom.filter.useful COUNT : 45684579
          rocksdb.bloom.filter.full.positive COUNT : 10395458
          rocksdb.bloom.filter.full.true.positive COUNT : 9908456
          rocksdb.number.multiget.get COUNT : 481984
          rocksdb.number.multiget.keys.read COUNT : 15423488
          rocksdb.number.multiget.bytes.read COUNT : 990845600
          rocksdb.number.multiget.keys.found COUNT : 9908456
      
      So that's about 25% higher throughput even for random keys
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6757
      
      Test Plan: unit test included
      
      Reviewed By: anand1976
      
      Differential Revision: D21243256
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 5644a1468d9e8c8575be02f4e04bc5d62dbbb57f
      bae6f586
    • P
      HISTORY.md update for bzip upgrade (#6767) · a7f0b27b
      Peter Dillinger 提交于
      Summary:
      See https://github.com/facebook/rocksdb/issues/6714 and https://github.com/facebook/rocksdb/issues/6703
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6767
      
      Reviewed By: riversand963
      
      Differential Revision: D21283307
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 8463bec725669d13846c728ad4b5bde43f9a84f8
      a7f0b27b
  7. 28 4月, 2020 4 次提交
  8. 25 4月, 2020 1 次提交
    • C
      Reduce memory copies when fetching and uncompressing blocks from SST files (#6689) · 40497a87
      Cheng Chang 提交于
      Summary:
      In https://github.com/facebook/rocksdb/pull/6455, we modified the interface of `RandomAccessFileReader::Read` to be able to get rid of memcpy in direct IO mode.
      This PR applies the new interface to `BlockFetcher` when reading blocks from SST files in direct IO mode.
      
      Without this PR, in direct IO mode, when fetching and uncompressing compressed blocks, `BlockFetcher` will first copy the raw compressed block into `BlockFetcher::compressed_buf_` or `BlockFetcher::stack_buf_` inside `RandomAccessFileReader::Read` depending on the block size. then during uncompressing, it will copy the uncompressed block into `BlockFetcher::heap_buf_`.
      
      In this PR, we get rid of the first memcpy and directly uncompress the block from `direct_io_buf_` to `heap_buf_`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6689
      
      Test Plan: A new unit test `block_fetcher_test` is added.
      
      Reviewed By: anand1976
      
      Differential Revision: D21006729
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 2370b92c24075692423b81277415feb2aed5d980
      40497a87
  9. 24 4月, 2020 1 次提交
  10. 22 4月, 2020 1 次提交
  11. 21 4月, 2020 1 次提交
    • A
      Set max_background_flushes dynamically (#6701) · 03a1d95d
      Akanksha Mahajan 提交于
      Summary:
      1. Add changes so that max_background_flushes can be set dynamically.
                         2. Add a testcase DBOptionsTest.SetBackgroundFlushThreads which set the
                              max_background_flushes dynamically using SetDBOptions.
      
      TestPlan:  1. make -j64 check
                        2. Using new testcase DBOptionsTest.SetBackgroundFlushThreads
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6701
      
      Reviewed By: ajkr
      
      Differential Revision: D21028010
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 5f949e4a8fd3c32537b637947b7ee09a69cfc7c1
      03a1d95d
  12. 18 4月, 2020 1 次提交
    • Y
      Add IsDirectory() to Env and FS (#6711) · 243852ec
      Yanqin Jin 提交于
      Summary:
      IsDirectory() is a common API to check whether a path is a regular file or
      directory.
      POSIX: call stat() and use S_ISDIR(st_mode)
      Windows: PathIsDirectoryA() and PathIsDirectoryW()
      HDFS: FileSystem.IsDirectory()
      Java: File.IsDirectory()
      ...
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6711
      
      Test Plan: make check
      
      Reviewed By: anand1976
      
      Differential Revision: D21053520
      
      Pulled By: riversand963
      
      fbshipit-source-id: 680aadfd8ce982b63689190cf31b3145d5a89e27
      243852ec
  13. 16 4月, 2020 1 次提交
    • M
      Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621) · e45673de
      Mike Kolupaev 提交于
      Summary:
      Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype.
      
      Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling.
      
      It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas.
      
      Note that the deferred value loading only happens for *internal* iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621
      
      Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats.
      
      Reviewed By: siying
      
      Differential Revision: D20786930
      
      Pulled By: al13n321
      
      fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee
      e45673de
  14. 14 4月, 2020 2 次提交
    • Z
      Add NewFileChecksumGenCrc32cFactory to file checksum (#6688) · 38dfa406
      Zhichao Cao 提交于
      Summary:
      Add NewFileChecksumGenCrc32cFactory to file checksum public interface such that applications can use the build in crc32 checksum factory.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6688
      
      Test Plan: pass make asan_check
      
      Reviewed By: riversand963
      
      Differential Revision: D21006859
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ea8a45196a8b77c310728ab05f6cc0f49f3baef0
      38dfa406
    • A
      fix comparison count for format_version=3 indexes (#6650) · 9eca6d65
      Andrew Kryczka 提交于
      Summary:
      In index blocks since `format_version=3`, user keys are written
      rather than internal keys. When reading such blocks, the comparator is
      obtained via `InternalKeyComparator::user_comparator()`. That function
      must not return an unwrapped result as the wrapper class provides
      accounting logic to populate `PerfContext::user_key_comparison_count`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6650
      
      Test Plan:
      ran db_bench and verified
      `PerfContext::user_key_comparison_count` became larger.
      
      Reviewed By: cheng-chang
      
      Differential Revision: D20866325
      
      Pulled By: ajkr
      
      fbshipit-source-id: ad755d46bda31157dacc5b66e532279f19ad538c
      9eca6d65
  15. 11 4月, 2020 1 次提交
  16. 10 4月, 2020 1 次提交
  17. 09 4月, 2020 2 次提交
  18. 03 4月, 2020 1 次提交
  19. 02 4月, 2020 2 次提交
  20. 01 4月, 2020 2 次提交
  21. 31 3月, 2020 2 次提交
  22. 30 3月, 2020 1 次提交
    • Z
      Use FileChecksumGenFactory for SST file checksum (#6600) · e8d332d9
      Zhichao Cao 提交于
      Summary:
      In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600
      
      Test Plan: tested with make asan_check
      
      Reviewed By: riversand963
      
      Differential Revision: D20717670
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
      e8d332d9
  23. 29 3月, 2020 1 次提交
    • C
      Be able to decrease background thread's CPU priority when creating database backup (#6602) · ee50b8d4
      Cheng Chang 提交于
      Summary:
      When creating a database backup, the background threads will not only consume IO resources by copying files, but also consuming CPU such as by computing checksums. During peak times, the CPU consumption by the background threads might affect online queries.
      
      This PR makes it possible to decrease CPU priority of these threads when creating a new backup.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6602
      
      Test Plan: make check
      
      Reviewed By: siying, zhichao-cao
      
      Differential Revision: D20683216
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 9978b9ed9488e8ce135e90ca083e5b4b7221fd84
      ee50b8d4
  24. 28 3月, 2020 1 次提交
  25. 25 3月, 2020 1 次提交