1. 26 4月, 2022 2 次提交
    • J
      Add 95% confidence intervals to db_bench output (#9882) · fb9a167a
      Jaromir Vanek 提交于
      Summary:
      Enhancing `db_bench` output with 95% statistical confidence intervals for better performance evaluation. The goal is to unambiguously separate random variance when running benchmark over multiple iterations.
      
      Output enhanced with confidence intervals exposed in brackets:
      
      ```
      $ ./db_bench --benchmarks=fillseq[-X10]
      
      Running benchmark for 10 times
      fillseq      :       4.961 micros/op 201578 ops/sec;   22.3 MB/s
      fillseq      :       5.030 micros/op 198824 ops/sec;   22.0 MB/s
      fillseq [AVG 2 runs] : 200201 (± 2698) ops/sec;   22.1 (± 0.3) MB/sec
      fillseq      :       4.963 micros/op 201471 ops/sec;   22.3 MB/s
      fillseq [AVG 3 runs] : 200624 (± 1765) ops/sec;   22.2 (± 0.2) MB/sec
      fillseq      :       5.035 micros/op 198625 ops/sec;   22.0 MB/s
      fillseq [AVG 4 runs] : 200124 (± 1586) ops/sec;   22.1 (± 0.2) MB/sec
      fillseq      :       4.979 micros/op 200861 ops/sec;   22.2 MB/s
      fillseq [AVG 5 runs] : 200272 (± 1262) ops/sec;   22.2 (± 0.1) MB/sec
      fillseq      :       4.893 micros/op 204367 ops/sec;   22.6 MB/s
      fillseq [AVG 6 runs] : 200954 (± 1688) ops/sec;   22.2 (± 0.2) MB/sec
      fillseq      :       4.914 micros/op 203502 ops/sec;   22.5 MB/s
      fillseq [AVG 7 runs] : 201318 (± 1595) ops/sec;   22.3 (± 0.2) MB/sec
      fillseq      :       4.998 micros/op 200074 ops/sec;   22.1 MB/s
      fillseq [AVG 8 runs] : 201163 (± 1415) ops/sec;   22.3 (± 0.2) MB/sec
      fillseq      :       4.946 micros/op 202188 ops/sec;   22.4 MB/s
      fillseq [AVG 9 runs] : 201277 (± 1267) ops/sec;   22.3 (± 0.1) MB/sec
      fillseq      :       5.093 micros/op 196331 ops/sec;   21.7 MB/s
      fillseq [AVG 10 runs] : 200782 (± 1491) ops/sec;   22.2 (± 0.2) MB/sec
      fillseq [AVG    10 runs] : 200782 (± 1491) ops/sec;   22.2 (± 0.2) MB/sec
      fillseq [MEDIAN 10 runs] : 201166 ops/sec;   22.3 MB/s
      ```
      
      For more explicit interval representation, use `--confidence_interval_only` flag:
      
      ```
      $ ./db_bench --benchmarks=fillseq[-X10] --confidence_interval_only
      
      Running benchmark for 10 times
      fillseq      :       4.935 micros/op 202648 ops/sec;   22.4 MB/s
      fillseq      :       5.078 micros/op 196943 ops/sec;   21.8 MB/s
      fillseq [CI95 2 runs] : (194205, 205385) ops/sec; (21.5, 22.7) MB/sec
      fillseq      :       5.159 micros/op 193816 ops/sec;   21.4 MB/s
      fillseq [CI95 3 runs] : (192735, 202869) ops/sec; (21.3, 22.4) MB/sec
      fillseq      :       4.947 micros/op 202158 ops/sec;   22.4 MB/s
      fillseq [CI95 4 runs] : (194721, 203061) ops/sec; (21.5, 22.5) MB/sec
      fillseq      :       4.908 micros/op 203756 ops/sec;   22.5 MB/s
      fillseq [CI95 5 runs] : (196113, 203615) ops/sec; (21.7, 22.5) MB/sec
      fillseq      :       5.063 micros/op 197528 ops/sec;   21.9 MB/s
      fillseq [CI95 6 runs] : (196319, 202631) ops/sec; (21.7, 22.4) MB/sec
      fillseq      :       5.214 micros/op 191799 ops/sec;   21.2 MB/s
      fillseq [CI95 7 runs] : (194953, 201803) ops/sec; (21.6, 22.3) MB/sec
      fillseq      :       5.260 micros/op 190095 ops/sec;   21.0 MB/s
      fillseq [CI95 8 runs] : (193749, 200937) ops/sec; (21.4, 22.2) MB/sec
      fillseq      :       5.076 micros/op 196992 ops/sec;   21.8 MB/s
      fillseq [CI95 9 runs] : (194134, 200474) ops/sec; (21.5, 22.2) MB/sec
      fillseq      :       5.388 micros/op 185603 ops/sec;   20.5 MB/s
      fillseq [CI95 10 runs] : (192487, 199781) ops/sec; (21.3, 22.1) MB/sec
      fillseq [AVG    10 runs] : 196134 (± 3647) ops/sec;   21.7 (± 0.4) MB/sec
      fillseq [MEDIAN 10 runs] : 196968 ops/sec;   21.8 MB/sec
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9882
      
      Reviewed By: pdillinger
      
      Differential Revision: D35796148
      
      Pulled By: vanekjar
      
      fbshipit-source-id: 8313712d16728ff982b8aff28195ee56622385b8
      fb9a167a
    • A
      Add experimental new FS API AbortIO to cancel read request (#9901) · 5bd374b3
      Akanksha Mahajan 提交于
      Summary:
      Add experimental new API AbortIO in FileSystem to abort the
      read requests submitted asynchronously through ReadAsync API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9901
      
      Test Plan: Existing tests
      
      Reviewed By: anand1976
      
      Differential Revision: D35885591
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: df3944e6e9e6e487af1fa688376b4abb6837fb02
      5bd374b3
  2. 23 4月, 2022 1 次提交
  3. 22 4月, 2022 1 次提交
  4. 21 4月, 2022 4 次提交
    • Y
      Add rollback_deletion_type_callback to TxnDBOptions (#9873) · d13825e5
      Yanqin Jin 提交于
      Summary:
      This PR does not affect write-committed.
      
      Add a member, `rollback_deletion_type_callback` to TransactionDBOptions
      so that a write-prepared transaction, when rolling back, can call this
      callback to decide if a `Delete` or `SingleDelete` should be used to
      cancel a prior `Put` written to the database during prepare phase.
      
      The purpose of this PR is to prevent mixing `Delete` and `SingleDelete`
      for the same key, causing undefined behaviors. Without this PR, the
      following can happen:
      
      ```
      // The application always issues SingleDelete when deleting keys.
      
      txn1->Put('a');
      txn1->Prepare(); // writes to memtable and potentially gets flushed/compacted to Lmax
      txn1->Rollback();  // inserts DELETE('a')
      
      txn2->Put('a');
      txn2->Commit();  // writes to memtable and potentially gets flushed/compacted
      ```
      
      In the database, we may have
      ```
      L0:   [PUT('a', s=100)]
      L1:   [DELETE('a', s=90)]
      Lmax: [PUT('a', s=0)]
      ```
      
      If a compaction compacts L0 and L1, then we have
      ```
      L1:    [PUT('a', s=100)]
      Lmax:  [PUT('a', s=0)]
      ```
      
      If a future transaction issues a SingleDelete, we have
      ```
      L0:    [SD('a', s=110)]
      L1:    [PUT('a', s=100)]
      Lmax:  [PUT('a', s=0)]
      ```
      
      Then, a compaction including L0, L1 and Lmax leads to
      ```
      Lmax:  [PUT('a', s=0)]
      ```
      
      which is incorrect.
      
      Similar bugs reported and addressed in
      https://github.com/cockroachdb/pebble/issues/1255. Based on our team's
      current priority, we have decided to take this approach for now. We may
      come back and revisit in the future.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9873
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D35762170
      
      Pulled By: riversand963
      
      fbshipit-source-id: b28d56eefc786b53c9844b9ef4a7807acdd82c8d
      d13825e5
    • P
      Mark GetLiveFilesStorageInfo ready for production use (#9868) · 1bac873f
      Peter Dillinger 提交于
      Summary:
      ... by filling out remaining testing hole: handling of
      db_pathsi+cf_paths. (Note that while GetLiveFilesStorageInfo works
      with db_paths / cf_paths, Checkpoint and BackupEngine do not and
      are marked appropriately.)
      
      Also improved comments for "live files" APIs, and grouped them
      together in db.h.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9868
      
      Test Plan: Adding to existing unit tests
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35752254
      
      Pulled By: pdillinger
      
      fbshipit-source-id: c70eb67748fad61826e2f554b674638700abefb2
      1bac873f
    • J
      Add 7.2 to compatible check (#9858) · 2ea4205a
      Jay Zhuang 提交于
      Summary:
      Add 7.2 to compatible check (should change it with version update).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9858
      
      Reviewed By: riversand963
      
      Differential Revision: D35722897
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 08c782b9344599d7296543eb0c61afcd9a869a1a
      2ea4205a
    • Y
      Add --decode_blob_index option to idump and dump commands (#9870) · 9b5790f0
      yuzhangyu 提交于
      Summary:
      This patch completes the first part of the task: "Extend all three commands so they can decode and print blob references if a new option --decode_blob_index is specified"
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9870
      
      Reviewed By: ltamasi
      
      Differential Revision: D35753932
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 9d2bbba0eef2ed86b982767eba9de1b4881f35c9
      9b5790f0
  5. 20 4月, 2022 4 次提交
  6. 19 4月, 2022 5 次提交
    • A
      Avoid overwriting OPTIONS file settings in db_bench (#9862) · 690f1edf
      Andrew Kryczka 提交于
      Summary:
      `InitializeOptionsGeneral()` was overwriting many options that were already configured by OPTIONS file, potentially with the flag default values. This PR changes that function to only overwrite options in limited scenarios, as described at the top of its definition. Block cache is still a violation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9862
      
      Test Plan: ran under various scenarios (multi-DB, single DB, OPTIONS file, flags) and verified options are set as expected
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35736960
      
      Pulled By: ajkr
      
      fbshipit-source-id: 75b77740af37e6f5741618f8a8f5685df2417d03
      690f1edf
    • P
      Misc CI improvements / additions (#9859) · 1601433b
      Peter Dillinger 提交于
      Summary:
      * Add valgrind test to nightly CircleCI (in case it can catch something that
      ASAN/UBSAN does not)
      * Add clang13+asan+ubsan+folly test to nightly CircleCI, for broader testing
      * Consolidate many copies of ASAN_OPTIONS= while also allowing it to be
      inherited from parent environment rather than always overridden.
      * Move UBSAN exclusion from Makefile into options_settable_test.cc
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9859
      
      Test Plan: CI
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35730903
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6f5464034e8115f9a07f6f7aec1de9219ec2837c
      1601433b
    • H
      Conditionally declare and define variable that is unused in LITE mode (#9854) · e83c5543
      Hui Xiao 提交于
      Summary:
      Context:
      As mentioned in https://github.com/facebook/rocksdb/issues/9701, we have the following in LITE=1 make static_lib for v7.0.2
      ```
        CC       file/sequence_file_reader.o
        CC       file/sst_file_manager_impl.o
        CC       file/writable_file_writer.o
      In file included from file/writable_file_writer.cc:10:
      ./file/writable_file_writer.h:163:15: error: private field 'temperature_' is not used [-Werror,-Wunused-private-field]
        Temperature temperature_;
                    ^
      1 error generated.
      make: *** [file/writable_file_writer.o] Error 1
      ```
      
       as titled
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9854
      
      Test Plan:
      - Local `LITE=1 make static_lib` reveals the same error and error is gone after this fix
      - CI
      
      Reviewed By: ajkr, jay-zhuang
      
      Differential Revision: D35706585
      
      Pulled By: hx235
      
      fbshipit-source-id: 7743310298231ad6866304ffa2225c8abdc91d9a
      e83c5543
    • P
      Add "no compression" job to CircleCI (#9850) · 41237dd3
      Peter Dillinger 提交于
      Summary:
      Since they operate at distinct abstraction layers, I thought it
      was prudent to combine with EncryptedEnv CI test for each PR, for efficiency
      in testing. Also added supported compressions to sst_dump --help output
      so that CI job can verify no compiled-in compression support.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9850
      
      Test Plan: CI, some manual stuff
      
      Reviewed By: riversand963
      
      Differential Revision: D35682346
      
      Pulled By: pdillinger
      
      fbshipit-source-id: be9879c1533fed304ee32c89fd9ba4b07c2b90cc
      41237dd3
    • J
      Update main version.h to NEXT release (7.3) (#9852) · 3d473235
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9852
      
      Reviewed By: ajkr
      
      Differential Revision: D35694753
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 729d416afc588e5db2367e899589bbb5419820d6
      3d473235
  7. 17 4月, 2022 1 次提交
  8. 16 4月, 2022 7 次提交
    • S
      Add Aggregation Merge Operator (#9780) · 4f9c0fd0
      sdong 提交于
      Summary:
      Add a merge operator that allows users to register specific aggregation function so that they can does aggregation based per key using different aggregation types.
      See comments of function CreateAggMergeOperator() for actual usage.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9780
      
      Test Plan: Add a unit test to coverage various cases.
      
      Reviewed By: ltamasi
      
      Differential Revision: D35267444
      
      fbshipit-source-id: 5b02f31c4f3e17e96dd4025cdc49fca8c2868628
      4f9c0fd0
    • L
      Propagate errors from UpdateBoundaries (#9851) · db536ee0
      Levi Tamasi 提交于
      Summary:
      In `FileMetaData`, we keep track of the lowest-numbered blob file
      referenced by the SST file in question for the purposes of BlobDB's
      garbage collection in the `oldest_blob_file_number` field, which is
      updated in `UpdateBoundaries`. However, with the current code,
      `BlobIndex` decoding errors (or invalid blob file numbers) are swallowed
      in this method. The patch changes this by propagating these errors
      and failing the corresponding flush/compaction. (Note that since blob
      references are generated by the BlobDB code and also parsed by
      `CompactionIterator`, in reality this can only happen in the case of
      memory corruption.)
      
      This change necessitated updating some unit tests that involved
      fake/corrupt `BlobIndex` objects. Some of these just used a dummy string like
      `"blob_index"` as a placeholder; these were replaced with real `BlobIndex`es.
      Some were relying on the earlier behavior to simulate corruption; these
      were replaced with `SyncPoint`-based test code that corrupts a valid
      blob reference at read time.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9851
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D35683671
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f7387af9945c48e4d5c4cd864f1ba425c7ad51f6
      db536ee0
    • Y
      Add a `fail_if_not_bottommost_level` to IngestExternalFileOptions (#9849) · be81609b
      Yanqin Jin 提交于
      Summary:
      This new options allows application to specify that files must be
      ingested to bottommost level, otherwise the ingestion will fail instead
      of silently ingesting to a non-bottommost level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9849
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D35680307
      
      Pulled By: riversand963
      
      fbshipit-source-id: 01cf54ef6c76198f7654dc06b5544631dea1be1e
      be81609b
    • A
      Make initial auto readahead_size configurable (#9836) · 0c7f455f
      Akanksha Mahajan 提交于
      Summary:
      Make initial auto readahead_size configurable
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9836
      
      Test Plan:
      Added new unit test
      Ran regression:
      Without change:
      
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
      With this change:
      ```
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Set seed to 1649895440554504 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.2
      Date:       Wed Apr 13 17:17:20 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      ... finished 100 ops
      seekrandom   :  476892.488 micros/op 2 ops/sec;  344.6 MB/s (252 of 252 found)
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D35632815
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: c8057a88f9294c9d03b1d434b03affe02f74d796
      0c7f455f
    • S
      Upgrade development environment. (#9843) · d5dfa8c6
      sdong 提交于
      Summary:
      It's to support Meta's internal environment platform010. Gcc still doesn't work but USE_CLANG=1 should work.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9843
      
      Test Plan: Try to make and ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 USE_CLANG=1 make
      
      Reviewed By: pdillinger
      
      Differential Revision: D35652507
      
      fbshipit-source-id: a4a14b2fa4a2d6ca6fbf1b65060e81c39f079363
      d5dfa8c6
    • J
      Remove flaky servicelab metrics DBPut P95/P99 (#9844) · e91ec64c
      Jay Zhuang 提交于
      Summary:
      The P95 and P99 metrics are flaky, similar to DBGet ones which removed
      in https://github.com/facebook/rocksdb/issues/9742 .
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9844
      
      Test Plan: `$ ./buckifier/buckify_rocksdb.py`
      
      Reviewed By: ajkr
      
      Differential Revision: D35655531
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: c1409f0fba4e23d461a65f988c27ac5e2ae85d13
      e91ec64c
    • Y
      Add option --decode_blob_index to dump_live_files command (#9842) · 082eb042
      yuzhangyu 提交于
      Summary:
      This change only add decode blob index support to dump_live_files command, which is part of a task to add blob support to a few commands.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9842
      
      Reviewed By: ltamasi
      
      Differential Revision: D35650167
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: a78151b98bc38ac6f52c6e01ca6927a3429ddd14
      082eb042
  9. 15 4月, 2022 4 次提交
    • Y
      Add checks to GetUpdatesSince (#9459) · fe63899d
      Yanqin Jin 提交于
      Summary:
      Make `DB::GetUpdatesSince` return early if told to scan WALs generated by transactions
      with write-prepared or write-unprepared policies (`seq_per_batch` is true), as indicated by
      API comment.
      
      Also add checks to `TransactionLogIterator` to clarify some conditions.
      
      No API change.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9459
      
      Test Plan:
      make check
      
      Closing https://github.com/facebook/rocksdb/issues/1565
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D33821243
      
      Pulled By: riversand963
      
      fbshipit-source-id: c8b155d020ce0980e2d3b3b1da40b96e65b48d79
      fe63899d
    • Y
      CompactionIterator sees consistent view of which keys are committed (#9830) · 0bd4dcde
      Yanqin Jin 提交于
      Summary:
      **This PR does not affect the functionality of `DB` and write-committed transactions.**
      
      `CompactionIterator` uses `KeyCommitted(seq)` to determine if a key in the database is committed.
      As the name 'write-committed' implies, if write-committed policy is used, a key exists in the database only if
      it is committed. In fact, the implementation of `KeyCommitted()` is as follows:
      
      ```
      inline bool KeyCommitted(SequenceNumber seq) {
        // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
        return snapshot_checker_ == nullptr ||
               snapshot_checker_->CheckInSnapshot(seq, kMaxSequence) == SnapshotCheckerResult::kInSnapshot;
      }
      ```
      
      With that being said, we focus on write-prepared/write-unprepared transactions.
      
      A few notes:
      - A key can exist in the db even if it's uncommitted. Therefore, we rely on `snapshot_checker_` to determine data visibility. We also require that all writes go through transaction API instead of the raw `WriteBatch` + `Write`, thus at most one uncommitted version of one user key can exist in the database.
      - `CompactionIterator` outputs a key as long as the key is uncommitted.
      
      Due to the above reasons, it is possible that `CompactionIterator` decides to output an uncommitted key without
      doing further checks on the key (`NextFromInput()`). By the time the key is being prepared for output, the key becomes
      committed because the `snapshot_checker_(seq, kMaxSequence)` becomes true in the implementation of `KeyCommitted()`.
      Then `CompactionIterator` will try to zero its sequence number and hit assertion error if the key is a tombstone.
      
      To fix this issue, we should make the `CompactionIterator` see a consistent view of the input keys. Note that
      for write-prepared/write-unprepared, the background flush/compaction jobs already take a "job snapshot" before starting
      processing keys. The job snapshot is released only after the entire flush/compaction finishes. We can use this snapshot
      to determine whether a key is committed or not with minor change to `KeyCommitted()`.
      
      ```
      inline bool KeyCommitted(SequenceNumber sequence) {
        // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
        return snapshot_checker_ == nullptr ||
               snapshot_checker_->CheckInSnapshot(sequence, job_snapshot_) ==
                   SnapshotCheckerResult::kInSnapshot;
      }
      ```
      
      As a result, whether a key is committed or not will remain a constant throughout compaction, causing no trouble
      for `CompactionIterator`s assertions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9830
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D35561162
      
      Pulled By: riversand963
      
      fbshipit-source-id: 0e00d200c195240341cfe6d34cbc86798b315b9f
      0bd4dcde
    • J
      Fix minimum libzstd version that supports ZSTD_STREAMING (#9841) · 844a3510
      Jonathan Albrecht 提交于
      Summary:
      The minimum libzstd version that has `ZSTD_compressStream2` is
      1.4.0 so only define ZSTD_STREAMING in that case.
      
      Fixes building on Ubuntu 18.04 which has libzstd 1.3.3 as its
      repository version.
      
      Fixes https://github.com/facebook/rocksdb/issues/9795
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9841
      
      Test Plan:
      Build and test on Ubuntu 18.04 with:
        apt-get install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev \
          libzstd-dev libgflags-dev g++ make curl
      
      Reviewed By: ajkr
      
      Differential Revision: D35648738
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 2a9e969bcc17a7dc10172f3817283409de885811
      844a3510
    • A
      Expose `CacheEntryRole` and map keys for block cache stat collections (#9838) · d6e016be
      Andrew Kryczka 提交于
      Summary:
      This gives users the ability to examine the map populated by `GetMapProperty()` with property `kBlockCacheEntryStats`. It also sets us up for a possible future where cache reservations are configured according to `CacheEntryRole`s rather than flags coupled to roles.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9838
      
      Test Plan:
      - migrated test DBBlockCacheTest.CacheEntryRoleStats to use this API. That test verifies some of the contents are as expected
      - added a DBPropertiesTest to verify the public map keys are present, and nothing else
      
      Reviewed By: hx235
      
      Differential Revision: D35629493
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5c4356b8560e85d1f881fd32c44c15960b02fc68
      d6e016be
  10. 14 4月, 2022 6 次提交
  11. 13 4月, 2022 4 次提交
  12. 12 4月, 2022 1 次提交
    • A
      Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true (#9634) · ae82d914
      Akanksha Mahajan 提交于
      Summary:
      1) In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
      flush the data from WAL to L0 for all column families if possible. As a
      result, not all column families can increase their log_numbers, and
      min_log_number_to_keep won't change.
      2) For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
      
      If we persist a new MANIFEST with
      advanced log_numbers for some column families, then during a second
      crash after persisting the MANIFEST, RocksDB will see some column
      families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.
      
      As a solution,
      1. the corrupted WALs whose numbers are larger than the
      corrupted wal and smaller than the new WAL will be moved to archive folder.
      2. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9634
      
      Test Plan:
      1. Added new unit tests
                      2. make crast_test -j
      
      Reviewed By: riversand963
      
      Differential Revision: D34463666
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e233d3af0ed4e2028ca0cf051e5a334a0fdc9d19
      ae82d914