1. 16 9月, 2022 2 次提交
  2. 15 9月, 2022 7 次提交
    • J
      Tiered Storage feature doesn't support BlobDB yet (#10681) · 1cdc8411
      Jay Zhuang 提交于
      Summary:
      Disable the tiered storage + BlobDB test.
      Also enable different hot data setting for Tiered compaction
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10681
      
      Reviewed By: ajkr
      
      Differential Revision: D39531941
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: aa0595eb38d03f17638d300d2e4cc9061429bf61
      1cdc8411
    • J
      Refactor Compaction file cut `ShouldStopBefore()` (#10629) · 849cf1bf
      Jay Zhuang 提交于
      Summary:
      Consolidate compaction output cut logic to `ShouldStopBefore()` and move
      it inside of CompactionOutputs class.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10629
      
      Reviewed By: cbi42
      
      Differential Revision: D39315536
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 7d81037babbd35c276bbaad02dbc2bb555fdac18
      849cf1bf
    • Y
      Fix a bug by setting up subcompaction bounds properly (#10658) · ce2c11d8
      Yanqin Jin 提交于
      Summary:
      When user-defined timestamp is enabled, subcompaction bounds should be set up properly. When creating InputIterator for the compaction, the `start` and `end` should have their timestamp portions set to kMaxTimestamp, which is the highest possible timestamp. This is similar to what we do with setting up their sequence numbers to `kMaxSequenceNumber`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10658
      
      Test Plan:
      ```bash
      make check
      rm -rf /dev/shm/rocksdb/* && mkdir
      /dev/shm/rocksdb/rocksdb_crashtest_expected && ./db_stress
      --allow_data_in_errors=True --clear_column_family_one_in=0
      --continuous_verification_interval=0 --data_block_index_type=1
      --db=/dev/shm/rocksdb//rocksdb_crashtest_blackbox --delpercent=5
      --delrangepercent=0
      --expected_values_dir=/dev/shm/rocksdb//rocksdb_crashtest_expected
      --iterpercent=0 --max_background_compactions=20
      --max_bytes_for_level_base=10485760 --max_key=25000000
      --max_write_batch_group_size_bytes=1048576 --nooverwritepercent=1
      --ops_per_thread=300000 --paranoid_file_checks=1 --partition_filters=0
      --prefix_size=8 --prefixpercent=5 --readpercent=30 --reopen=0
      --snapshot_hold_ops=100000 --subcompactions=4
      --target_file_size_base=65536 --target_file_size_multiplier=2
      --test_batches_snapshots=0 --test_cf_consistency=0 --use_multiget=1
      --user_timestamp_size=8 --value_size_mult=32 --verify_checksum=1
      --write_buffer_size=65536 --writepercent=60 -disable_wal=1
      -column_families=1
      ```
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D39393402
      
      Pulled By: riversand963
      
      fbshipit-source-id: f276e35b19fce51a175c368a502fb0718d1f3871
      ce2c11d8
    • C
      Fix data race in accessing `cached_range_tombstone_` (#10680) · be04a3b6
      Changyu Bi 提交于
      Summary:
      fix a data race introduced in https://github.com/facebook/rocksdb/issues/10547 (P5295241720), first reported by pdillinger. The race is between the `std::atomic_load_explicit` in NewRangeTombstoneIteratorInternal and the `std::atomic_store_explicit` in MemTable::Add() that operate on `cached_range_tombstone_`. P5295241720 shows that `atomic_store_explicit` initializes some mutex which `atomic_load_explicit` could be trying to call `lock()` on at the same time. This fix moves the initialization to memtable constructor.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10680
      
      Test Plan: `USE_CLANG=1 COMPILE_WITH_TSAN=1 make -j24 whitebox_crash_test`
      
      Reviewed By: ajkr
      
      Differential Revision: D39528696
      
      Pulled By: cbi42
      
      fbshipit-source-id: ee740841044438e18ad2b8ea567444dd542dd8e2
      be04a3b6
    • Y
      Reset pessimistic transaction's read/commit timestamps during Initialize() (#10677) · 832fd644
      Yanqin Jin 提交于
      Summary:
      RocksDB allows reusing old `Transaction` objects when creating new ones. Therefore, we need to
      reset the transaction's read and commit timestamps back to default values `kMaxTxnTimestamp`.
      Otherwise, `CommitAndTryCreateSnapshot()` may fail with "Status::InvalidArgument("Different commit ts specified")".
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10677
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D39513543
      
      Pulled By: riversand963
      
      fbshipit-source-id: bea01cac149bff3a23a2978fc0c3b198243a6291
      832fd644
    • L
      Add comments describing {Put,Get}Entity, update/clarify comment for Get and iterator (#10676) · 87c8bb4b
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10676
      
      Reviewed By: riversand963
      
      Differential Revision: D39512081
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 55704478ceb8081003eceeb0c5a3875cb806587e
      87c8bb4b
    • A
      Bypass a MultiGet test when async_io is used (#10669) · bb9a6d4e
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10669
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D39492658
      
      Pulled By: anand1976
      
      fbshipit-source-id: abef79808e30762654680f7dd7e46487c631febc
      bb9a6d4e
  3. 14 9月, 2022 3 次提交
    • A
      Change MultiGet multi-level optimization to default on (#10671) · 7b11d484
      anand76 提交于
      Summary:
      Change the ```ReadOptions.optimize_multiget_for_io``` flag to default on. It doesn't impact regular MultiGet users as its only applicable when ```ReadOptions.async_io``` is also set to true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10671
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D39477439
      
      Pulled By: anand1976
      
      fbshipit-source-id: 47abcdbfa69f9bc60422ab68a238b232e085d4ba
      7b11d484
    • L
      Add wide-column support to iterators (#10670) · 06ab0a8b
      Levi Tamasi 提交于
      Summary:
      The patch extends the iterator API with a new `columns` method which
      can be used to retrieve all wide columns for the current key. Similarly to
      the `Get` and `GetEntity` APIs, the classic `value` API returns the value
      of the default (anonymous) column for wide-column entities, and `columns`
      returns an entity with a single default column for plain old key-values.
      (The goal here is to maintain the invariant that `value()` is the same as
      the value of the default column in `columns()`.) The patch also involves a
      smaller refactoring: historically, `value()` was implemented using a bunch
      of conditions, that is, the `Slice` to be returned was decided based on the
      direction of the iteration, whether a merge had been done etc. when the
      method was called; with the patch, the value to be exposed is stored in a
      member `Slice value_` when the iterator lands on a new key, and `value()`
      simply returns this `Slice`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10670
      
      Test Plan: Ran `make check` and a simple blackbox crash test.
      
      Reviewed By: riversand963
      
      Differential Revision: D39475551
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 29e7a6ed9ef340841aab36803b832b7c8f668b0b
      06ab0a8b
    • C
      Cache fragmented range tombstone list for mutable memtables (#10547) · f291eefb
      Changyu Bi 提交于
      Summary:
      Each read from memtable used to read and fragment all the range tombstones into a `FragmentedRangeTombstoneList`. https://github.com/facebook/rocksdb/issues/10380 improved the inefficient here by caching a `FragmentedRangeTombstoneList` with each immutable memtable. This PR extends the caching to mutable memtables. The fragmented range tombstone can be constructed in either read (This PR) or write path (https://github.com/facebook/rocksdb/issues/10584). With both implementation, each `DeleteRange()` will invalidate the cache, and the difference is where the cache is re-constructed.`CoreLocalArray` is used to store the cache with each memtable so that multi-threaded reads can be efficient. More specifically, each core will have a shared_ptr to a shared_ptr pointing to the current cache. Each read thread will only update the reference count in its core-local shared_ptr, and this is only needed when reading from mutable memtables.
      
      The choice between write path and read path is not an easy one: they are both improvement compared to no caching in the current implementation, but they favor different operations and could cause regression in the other operation (read vs write). The write path caching in (https://github.com/facebook/rocksdb/issues/10584) leads to a cleaner implementation, but I chose the read path caching here to avoid significant regression in write performance when there is a considerable amount of range tombstones in a single memtable (the number from the benchmark below suggests >1000 with concurrent writers). Note that even though the fragmented range tombstone list is only constructed in `DeleteRange()` operations, it could block other writes from proceeding, and hence affects overall write performance.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10547
      
      Test Plan:
      - TestGet() in stress test is updated in https://github.com/facebook/rocksdb/issues/10553 to compare Get() result against expected state: `./db_stress_branch --readpercent=57 --prefixpercent=4 --writepercent=25 -delpercent=5 --iterpercent=5 --delrangepercent=4`
      - Perf benchmark: tested read and write performance where a memtable has 0, 1, 10, 100 and 1000 range tombstones.
      ```
      ./db_bench --benchmarks=fillrandom,readrandom --writes_per_range_tombstone=200 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=200000 --reads=100000 --disable_auto_compactions --max_num_range_tombstones=1000
      ```
      Write perf regressed since the cost of constructing fragmented range tombstone list is shifted from every read to a single write. 6cbe5d8e172dc5f1ef65c9d0a6eedbd9987b2c72 is included in the last column as a reference to see performance impact on multi-thread reads if `CoreLocalArray` is not used.
      
      micros/op averaged over 5 runs: first 4 columns are for fillrandom, last 4 columns are for readrandom.
      |   |fillrandom main           | write path caching          | read path caching          |memtable V3 (https://github.com/facebook/rocksdb/issues/10308)     | readrandom main            | write path caching           | read path caching            |memtable V3      |
      |---   |---  |---   |---   |---   | ---   |           ---   |  ---   |  ---   |
      | 0                    |6.35                           |6.15                           |5.82                           |6.12                           |2.24                           |2.26                           |2.03                           |2.07                           |
      | 1                    |5.99                           |5.88                           |5.77                           |6.28                           |2.65                           |2.27                           |2.24                           |2.5                            |
      | 10                   |6.15                           |6.02                           |5.92                           |5.95                           |5.15                           |2.61                           |2.31                           |2.53                           |
      | 100                  |5.95                           |5.78                           |5.88                           |6.23                           |28.31                          |2.34                           |2.45                           |2.94                           |
      | 100 25 threads       |52.01                          |45.85                          |46.18                          |47.52                          |35.97                          |3.34                           |3.34                           |3.56                           |
      | 1000                 |6.0                            |7.07                           |5.98                           |6.08                           |333.18                         |2.86                           |2.7                            |3.6                            |
      | 1000 25 threads      |52.6                           |148.86                         |79.06                          |45.52                          |473.49                         |3.66                           |3.48                           |4.38                           |
      
        - Benchmark performance of`readwhilewriting` from https://github.com/facebook/rocksdb/issues/10552, 100 range tombstones are written: `./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=500 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=100000 --reads=500000 --disable_auto_compactions --max_num_range_tombstones=10000 --finish_after_writes`
      
      readrandom micros/op:
      |  |main            |write path caching           |read path caching            |memtable V3      |
      |---|---|---|---|---|
      | single thread        |48.28                          |1.55                           |1.52                           |1.96                           |
      | 25 threads           |64.3                           |2.55                           |2.67                           |2.64                           |
      
      Reviewed By: ajkr
      
      Differential Revision: D38895410
      
      Pulled By: cbi42
      
      fbshipit-source-id: 930bfc309dd1b2f4e8e9042f5126785bba577559
      f291eefb
  4. 13 9月, 2022 4 次提交
    • A
      Async optimization in scan path (#10602) · 03fc4397
      Akanksha Mahajan 提交于
      Summary:
      Optimizations
      1. In FilePrefetchBuffer, when data is overlapping between two buffers, it copies the data from first to third buffer, then from
      second to third buffer to return continuous buffer. This optimization will call ReadAsync on first buffer as soon as buffer is empty instead of getting blocked by second buffer to copy the data.
      2. For fixed size readahead_size, FilePrefetchBuffer will issues two async read calls. One with length + readahead_size_/2 on first buffer(if buffer is empty) and readahead_size_/2 on second buffer during seek.
      
      - Add readahead_size to db_stress for stress testing these changes in https://github.com/facebook/rocksdb/pull/10632
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10602
      
      Test Plan:
      - CircleCI tests
      - stress_test completed successfully
      export CRASH_TEST_EXT_ARGS="--async_io=1"
      make crash_test -j32
      - db_bench showed no regression
         With this PR:
      ```
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main1 -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=50000000 -use_direct_reads=false -seek_nexts=327680 -duration=30 -ops_between_duration_checks=1 -async_io=1
      Set seed to 1661876074584472 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 7.7.0
      Date:       Tue Aug 30 09:14:34 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    50000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    25939.9 MB (estimated)
      FileSize:   13732.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main1]
      seekrandom   :  270878.018 micros/op 3 ops/sec 30.068 seconds 111 operations;  618.7 MB/s (111 of 111 found)
      
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main1 -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=50000000 -use_direct_reads=true -seek_nexts=327680 -duration=30 -ops_between_duration_checks=1 -async_io=1
      Set seed to 1661875332862922 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 7.7.0
      Date:       Tue Aug 30 09:02:12 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    50000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    25939.9 MB (estimated)
      FileSize:   13732.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main1]
      seekrandom   :  358352.488 micros/op 2 ops/sec 30.102 seconds 84 operations;  474.4 MB/s (84 of 84 found)
      ```
      
      Without PR in main:
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main1 -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=50000000 -use_direct_reads=false -seek_nexts=327680 -duration=30 -ops_between_duration_checks=1 -async_io=1
      Set seed to 1661876425983045 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 7.7.0
      Date:       Tue Aug 30 09:20:26 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    50000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    25939.9 MB (estimated)
      FileSize:   13732.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main1]
      seekrandom   :  280881.953 micros/op 3 ops/sec 30.054 seconds 107 operations;  605.2 MB/s (107 of 107 found)
      
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main1 -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=50000000 -use_direct_reads=false -seek_nexts=327680 -duration=30 -ops_between_duration_checks=1 -async_io=0
      Set seed to 1661876475267771 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 7.7.0
      Date:       Tue Aug 30 09:21:15 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    50000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    25939.9 MB (estimated)
      FileSize:   13732.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main1]
      seekrandom   :  363155.084 micros/op 2 ops/sec 30.142 seconds 83 operations;  468.1 MB/s (83 of 83 found)
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D39141328
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 560655922c1a437a8569c228abb31b8c0b413120
      03fc4397
    • A
      db_stress option to preserve all files until verification success (#10659) · 03c4ea26
      Andrew Kryczka 提交于
      Summary:
      In `db_stress`, DB and expected state files containing changes leading up to a verification failure are often deleted, which makes debugging such failures difficult. On the DB side, flushed WAL files and compacted SST files are marked obsolete and then deleted. Without those files, we cannot pinpoint where a key that failed verification changed unexpectedly. On the expected state side, files for verifying prefix-recoverability in the presence of unsynced data loss are deleted before verification. These include a baseline state file containing the expected state at the time of the last successful verification, and a trace file containing all operations since then. Without those files, we cannot know the sequence of DB operations expected to be recovered.
      
      This PR attempts to address this gap with a new `db_stress` flag: `preserve_unverified_changes`. Setting `preserve_unverified_changes=1` has two effects.
      
      First, prior to startup verification, `db_stress` hardlinks all DB and expected state files in "unverified/" subdirectories of `FLAGS_db` and `FLAGS_expected_values_dir`. The separate directories are needed because the pre-verification opening process deletes files written by the previous `db_stress` run as described above. These "unverified/" subdirectories are cleaned up following startup verification success.
      
      I considered other approaches for preserving DB files through startup verification, like using a read-only DB or preventing deletion of DB files externally, e.g., in the `Env` layer. However, I decided against it since such an approach would not work for expected state files, and I did not want to change the DB management logic. If there were a way to disable DB file deletions before regular DB open, I would have preferred to use that.
      
      Second, `db_stress` attempts to keep all DB and expected state files that were live at some point since the start of the `db_stress` run. This is a bit tricky and involves the following changes.
      
      - Open the DB with `disable_auto_compactions=1` and `avoid_flush_during_recovery=1`
      - DisableFileDeletions()
      - EnableAutoCompactions()
      
      For this part, too, I would have preferred to use a hypothetical API that disables DB file deletion before regular DB open.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10659
      
      Reviewed By: hx235
      
      Differential Revision: D39407454
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6e981025c7dce147649d2e770728471395a7fa53
      03c4ea26
    • A
      Fix stress test failure for async_io (#10660) · bd2ad2f9
      Akanksha Mahajan 提交于
      Summary:
      Sanitize initial_auto_readahead_size if its greater than max_auto_readahead_size in case of async_io
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10660
      
      Test Plan: Ran db_stress with intitial_auto_readahead_size  greater than max_auto_readahead_size.
      
      Reviewed By: anand1976
      
      Differential Revision: D39408095
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 07f933242f636cfbc7ccf042e0c8b959a8ec5f3a
      bd2ad2f9
    • H
      Inject spurious wakeup and sleep before acquiring db mutex to expose race condition (#10291) · f79b3d19
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      Previous experience with bugs and flaky tests taught us there exist features in RocksDB vulnerable to race condition caused by acquiring db mutex at a particular timing. This PR aggressively exposes those vulnerable features by injecting spurious wakeup and sleep to cause acquiring db mutex at various timing in order to expose such race condition
      
      **Testing:**
      - `COERCE_CONTEXT_SWITCH=1 make -j56 check / make -j56 db_stress` should reveal
          - flaky tests caused by db mutex related race condition
             - Reverted https://github.com/facebook/rocksdb/pull/9528
             - A/B testing on `COMPILE_WITH_TSAN=1 make -j56 listener_test` w/ and w/o `COERCE_CONTEXT_SWITCH=1` followed by `./listener_test --gtest_filter=EventListenerTest.MultiCF --gtest_repeat=10`
             - `COERCE_CONTEXT_SWITCH=1` can cause expected test failure (i.e, expose target TSAN data race error) within 10 run while the other couldn't.
             - This proves our injection can expose flaky tests caused by db mutex related race condition faster.
          -  known or new race-condition-type of internal bug by continuously running this PR
      - Performance
         - High ops-threads time: COERCE_CONTEXT_SWITCH=1 regressed by 4 times slower (2:01.16 vs 0:22.10 elapsed ). This PR will be run as a separate CI job so this regression won't affect any existing job.
      ```
      TEST_TMPDIR=$db /usr/bin/time ./db_stress \
      --ops_per_thread=100000 --expected_values_dir=$exp --clear_column_family_one_in=0 \
      --write_buffer_size=524288 —target_file_size_base=524288 —ingest_external_file_one_in=100 —compact_files_one_in=1000 —compact_range_one_in=1000
      ```
        - Start-up time:  COERCE_CONTEXT_SWITCH=1 didn't regress by 25% (0:01.51 vs 0:01.29 elapsed)
      ```
      TEST_TMPDIR=$db ./db_stress -ops_per_thread=100000000 -expected_values_dir=$exp --clear_column_family_one_in=0 & sleep 120; pkill -9 db_stress
      
      TEST_TMPDIR=$db /usr/bin/time ./db_stress \
      --ops_per_thread=1 -reopen=0 --expected_values_dir=$exp --clear_column_family_one_in=0 --destroy_db_initially=0
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10291
      
      Reviewed By: ajkr
      
      Differential Revision: D39231182
      
      Pulled By: hx235
      
      fbshipit-source-id: 7ab6695430460e0826727fd8c66679b32b3e44b6
      f79b3d19
  5. 12 9月, 2022 1 次提交
    • A
      Build and link libfolly with RocksDB (#10103) · be09943f
      anand76 提交于
      Summary:
      The current integration with folly requires cherry-picking folly source files to include in RocksDB for external CI builds. Its not scaleable as we depend on more features in folly, such as coroutines. This PR adds a dependency from RocksDB to the folly library when ```USE_FOLLY``` or ```USE_COROUTINES``` are set. We build folly using the build scripts in ```third-party/folly```, relying on it to download and build its dependencies. A new ```Makefile``` target, ```build_folly```, is provided to make building folly easier.
      
      A new option, ```USE_FOLLY_LITE``` is added to retain the old model of compiling selected folly sources with RocksDB. This might be useful for short-term development.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10103
      
      Reviewed By: pdillinger
      
      Differential Revision: D38426787
      
      Pulled By: anand1976
      
      fbshipit-source-id: 33bc84abd9fdc7e2567749f02aa1b2494eb62b2f
      be09943f
  6. 10 9月, 2022 2 次提交
  7. 09 9月, 2022 4 次提交
    • A
      minor cleanups to db_crashtest.py (#10654) · 4100eb30
      Andrew Kryczka 提交于
      Summary:
      Expanded `all_params` to include all parameters crash test may set. Previously, `atomic_flush` was not included in `all_params` and thus was not visible to `finalize_and_sanitize()`. The consequence was manual crash test runs could provide unsafe combinations of parameters to `db_stress`. For example, running `db_crashtest.py` with `-atomic_flush=0` could cause `db_stress` to run with `-atomic_flush=0 -disable_wal=1`, which is known to produce inconsistencies across column families.
      
      While expanding `all_params`, I found we cannot have an entry in it for both `db_stress` and `db_crashtest.py`. So I renamed `enable_tiered_storage` to `test_tiered_storage` for `db_crashtest.py`, which appears more conventional anyways.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10654
      
      Reviewed By: hx235
      
      Differential Revision: D39369349
      
      Pulled By: ajkr
      
      fbshipit-source-id: 31d9010c760c868b20d5e9bd78ba75c8ff3ce348
      4100eb30
    • G
      Add PerfContext counters for CompressedSecondaryCache (#10650) · 0148c493
      gitbw95 提交于
      Summary:
      Add PerfContext counters for CompressedSecondaryCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10650
      
      Test Plan: Unit Tests.
      
      Reviewed By: anand1976
      
      Differential Revision: D39354712
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 1b90d3df99d08ddecd351edfd48d1e3723fdbc15
      0148c493
    • Y
      Fix overlapping check by excluding timestamp (#10615) · 3d67d791
      Yanqin Jin 提交于
      Summary:
      With user-defined timestamp, checking overlapping should exclude
      timestamp part from key. This has already been done for range checking
      for files in sstableKeyCompare(), but not yet done when checking with
      concurrent compactions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10615
      
      Test Plan:
      (Will add more tests)
      
      make check
      (Repro seems easier with this commit sha: git checkout 78bbdef5)
      rm -rf /dev/shm/rocksdb/* &&
      mkdir /dev/shm/rocksdb/rocksdb_crashtest_expected &&
      ./db_stress
      --allow_data_in_errors=True --clear_column_family_one_in=0
      --continuous_verification_interval=0 --data_block_index_type=1
      --db=/dev/shm/rocksdb//rocksdb_crashtest_blackbox --delpercent=5
      --delrangepercent=0
      --expected_values_dir=/dev/shm/rocksdb//rocksdb_crashtest_expected
      --iterpercent=0 --max_background_compactions=20
      --max_bytes_for_level_base=10485760 --max_key=25000000
      --max_write_batch_group_size_bytes=1048576 --nooverwritepercent=1
      --ops_per_thread=1000000 --paranoid_file_checks=1 --partition_filters=0
      --prefix_size=8 --prefixpercent=5 --readpercent=30 --reopen=0
      --snapshot_hold_ops=100000 --subcompactions=1 --compaction_pri=3
      --target_file_size_base=65536 --target_file_size_multiplier=2
      --test_batches_snapshots=0 --test_cf_consistency=0 --use_multiget=1
      --user_timestamp_size=8 --value_size_mult=32 --verify_checksum=1
      --write_buffer_size=65536 --writepercent=60 -disable_wal=1
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D39146797
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7fca800026ca6219220100b8b6cf84d907828163
      3d67d791
    • L
      Eliminate some allocations/copies around the blob cache (#10647) · fe56cb9a
      Levi Tamasi 提交于
      Summary:
      Historically, `BlobFileReader` has returned the blob(s) read from the file
      in the `PinnableSlice` provided by the client. This interface was
      preserved when caching was implemented for blobs, which meant that
      the blob data was copied multiple times when caching was in use: first,
      into the client-provided `PinnableSlice` (by `BlobFileReader::SaveValue`),
      and then, into the object stored in the cache (by `BlobSource::PutBlobIntoCache`).
      The patch eliminates these copies and the related allocations by changing
      `BlobFileReader` so it returns its results in the form of heap-allocated `BlobContents`
      objects that can be directly inserted into the cache. The allocations backing
      these `BlobContents` objects are made using the blob cache's allocator if the
      blobs are to be inserted into the cache (i.e. if a cache is configured and the
      `fill_cache` read option is set). Note: this PR focuses on the common case when
      blobs are compressed; some further small optimizations are possible for uncompressed
      blobs.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10647
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D39335185
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 464503d60a5520d654c8273ffb8efd5d1bcd7b36
      fe56cb9a
  8. 08 9月, 2022 2 次提交
    • P
      Always verify SST unique IDs on SST file open (#10532) · 6de7081c
      Peter Dillinger 提交于
      Summary:
      Although we've been tracking SST unique IDs in the DB manifest
      unconditionally, checking has been opt-in and with an extra pass at DB::Open
      time. This changes the behavior of `verify_sst_unique_id_in_manifest` to
      check unique ID against manifest every time an SST file is opened through
      table cache (normal DB operations), replacing the explicit pass over files
      at DB::Open time. This change also enables the option by default and
      removes the "EXPERIMENTAL" designation.
      
      One possible criticism is that the option no longer ensures the integrity
      of a DB at Open time. This is far from an all-or-nothing issue. Verifying
      the IDs of all SST files hardly ensures all the data in the DB is readable.
      (VerifyChecksum is supposed to do that.) Also, with
      max_open_files=-1 (default, extremely common), all SST files are
      opened at DB::Open time anyway.
      
      Implementation details:
      * `VerifySstUniqueIdInManifest()` functions are the extra/explicit pass
      that is now removed.
      * Unit tests that manipulate/corrupt table properties have to opt out of
      this check, because that corrupts the "actual" unique id. (And even for
      testing we don't currently have a mechanism to set "no unique id"
      in the in-memory file metadata for new files.)
      * A lot of other unit test churn relates to (a) default checking on, and
      (b) checking on SST open even without DB::Open (e.g. on flush)
      * Use `FileMetaData` for more `TableCache` operations (in place of
      `FileDescriptor`) so that we have access to the unique_id whenever
      we might need to open an SST file. **There is the possibility of
      performance impact because we can no longer use the more
      localized `fd` part of an `FdWithKeyRange` but instead follow the
      `file_metadata` pointer. However, this change (possible regression)
      is only done for `GetMemoryUsageByTableReaders`.**
      * Removed a completely unnecessary constructor overload of
      `TableReaderOptions`
      
      Possible follow-up:
      * Verification only happens when opening through table cache. Are there
      more places where this should happen?
      * Improve error message when there is a file size mismatch vs. manifest
      (FIXME added in the appropriate place).
      * I'm not sure there's a justification for `FileDescriptor` to be distinct from
      `FileMetaData`.
      * I'm skeptical that `FdWithKeyRange` really still makes sense for
      optimizing some data locality by duplicating some data in memory, but I
      could be wrong.
      * An unnecessary overload of NewTableReader was recently added, in
      the public API nonetheless (though unusable there). It should be cleaned
      up to put most things under `TableReaderOptions`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10532
      
      Test Plan:
      updated unit tests
      
      Performance test showing no significant difference (just noise I think):
      `./db_bench -benchmarks=readwhilewriting[-X10] -num=3000000 -disable_wal=1 -bloom_bits=8 -write_buffer_size=1000000 -target_file_size_base=1000000`
      Before: readwhilewriting [AVG 10 runs] : 68702 (± 6932) ops/sec
      After: readwhilewriting [AVG 10 runs] : 68239 (± 7198) ops/sec
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D38765551
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a827a708155f12344ab2a5c16e7701c7636da4c2
      6de7081c
    • B
      Avoid recompressing cold block in CompressedSecondaryCache (#10527) · d490bfcd
      Bo Wang 提交于
      Summary:
      **Summary:**
      When a block is firstly `Lookup` from the secondary cache, we just insert a dummy block in the primary cache (charging the actual size of the block) and don’t erase the block from the secondary cache. A standalone handle is returned from `Lookup`. Only if the block is hit again, we erase it from the secondary cache and add it into the primary cache.
      
      When a block is firstly evicted from the primary cache to the secondary cache, we just insert a dummy block (size 0) in the secondary cache. When the block is evicted again, it is treated as a hot block and is inserted into the secondary cache.
      
      **Implementation Details**
      Add a new state of LRUHandle: The handle is never inserted into the LRUCache (both hash table and LRU list) and it doesn't experience the above three states. The entry can be freed when refs becomes 0.  (refs >= 1 && in_cache == false && IS_STANDALONE == true)
      
      The behaviors of  `LRUCacheShard::Lookup()` are updated if the secondary_cache is CompressedSecondaryCache:
      1. If a handle is found in primary cache:
        1.1. If the handle's value is not nullptr, it is returned immediately.
        1.2. If the handle's value is nullptr, this means the handle is a dummy one. For a dummy handle, if it was retrieved from secondary cache, it may still exist in secondary cache.
          - 1.2.1. If no valid handle can be `Lookup` from secondary cache, return nullptr.
          - 1.2.2. If the handle from secondary cache is valid, erase it from the secondary cache and add it into the primary cache.
      2. If a handle is not found in primary cache:
        2.1. If no valid handle can be `Lookup` from secondary cache, return nullptr.
        2.2.  If the handle from secondary cache is valid, insert a dummy block in the primary cache (charging the actual size of the block)  and return a standalone handle.
      
      The behaviors of `LRUCacheShard::Promote()` are updated as follows:
      1. If `e->sec_handle` has value, one of the following steps can happen:
        1.1. Insert a dummy handle and return a standalone handle to caller when `secondary_cache_` is `CompressedSecondaryCache` and e is a standalone handle.
        1.2. Insert the item into the primary cache and return the handle to caller.
        1.3. Exception handling.
      3. If `e->sec_handle` has no value, mark the item as not in cache and charge the cache as its only metadata that'll shortly be released.
      
      The behavior of  `CompressedSecondaryCache::Insert()` is updated:
      1. If a block is evicted from the primary cache for the first time, a dummy item is inserted.
      4. If a dummy item is found for a block, the block is inserted into the secondary cache.
      
      The behavior of  `CompressedSecondaryCache:::Lookup()` is updated:
      1. If a handle is not found or it is a dummy item, a nullptr is returned.
      2. If `erase_handle` is true, the handle is erased.
      
      The behaviors of  `LRUCacheShard::Release()` are adjusted for the standalone handles.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10527
      
      Test Plan:
      1. stress tests.
      5. unit tests.
      6. CPU profiling for db_bench.
      
      Reviewed By: siying
      
      Differential Revision: D38747613
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 74a1eba7e1957c9affb2bd2ae3e0194584fa6eca
      d490bfcd
  9. 07 9月, 2022 3 次提交
    • L
      Support custom allocators for the blob cache (#10628) · c8543296
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10628
      
      Test Plan: `make check`
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D39228165
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 591fdff08db400b170b26f0165551f86d33c1dbf
      c8543296
    • A
      Deflake blob caching tests (#10636) · 5a97e6b1
      Andrew Kryczka 提交于
      Summary:
      Example failure:
      
      ```
      db/blob/db_blob_basic_test.cc:226: Failure
      Expected equality of these values:
        i
          Which is: 1
        num_blobs
          Which is: 5
      ```
      
      I can't repro locally, but it looks like the 2KB cache is too small to guarantee no eviction happens between loading all the data into cache and reading from `kBlockCacheTier`. This 2KB setting appears to have come from a test where the cached entries are pinned, where it makes sense to have a small setting. However, such a small setting makes less sense when the blocks are evictable but must remain cached per the test's expectation. This PR increases the capacity setting to 2MB for those cases.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10636
      
      Reviewed By: cbi42
      
      Differential Revision: D39250976
      
      Pulled By: ajkr
      
      fbshipit-source-id: 769309f9a19cfac20b67b927805c8df5c1d2d1f5
      5a97e6b1
    • A
      Deflake DBErrorHandlingFSTest.*WALWriteError (#10642) · 1ffadbe9
      Andrew Kryczka 提交于
      Summary:
      Example flake: https://app.circleci.com/pipelines/github/facebook/rocksdb/17660/workflows/7a891875-f07b-4a67-b204-eaa7ca9f9aa2/jobs/467496
      
      The test could get stuck in out-of-space due to a callback executing `SetFilesystemActive(false /* active */)` after the test executed `SetFilesystemActive(true /* active */)`. This could happen because background info logging went through the SyncPoint callback "WritableFileWriter::Append:BeforePrepareWrite", probably unintentionally. The solution of this PR is to call `ClearAllCallBacks()` to wait for any such pending callbacks to drain before calling `SetFilesystemActive(true /* active */)`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10642
      
      Reviewed By: cbi42
      
      Differential Revision: D39265381
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9a2f4916ab19726c8fb4b3a3b590b1b9ed93de1b
      1ffadbe9
  10. 06 9月, 2022 1 次提交
  11. 05 9月, 2022 2 次提交
    • A
      Disable RateLimiterTest.Rate with valgrind (#10637) · 36dec11b
      Andrew Kryczka 提交于
      Summary:
      Example valgrind flake: https://app.circleci.com/pipelines/github/facebook/rocksdb/18073/workflows/3794e569-45cb-4621-a2b4-df1dcdf5cb19/jobs/475569
      
      ```
      util/rate_limiter_test.cc:358
      Expected equality of these values:
        samples_at_minimum
          Which is: 9
        samples
          Which is: 10
      ```
      
      Some other runs of `RateLimiterTest.Rate` already skip this check due to its reliance on a minimum execution speed. We know valgrind slows execution a lot so can disable the check in that case.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10637
      
      Reviewed By: cbi42
      
      Differential Revision: D39251350
      
      Pulled By: ajkr
      
      fbshipit-source-id: 41ae1ea4cd91992ea57df902f9f7fd6d182a5932
      36dec11b
    • A
      Deflake DBBlockCacheTest1.WarmCacheWithBlocksDuringFlush (#10635) · fe5fbe32
      Andrew Kryczka 提交于
      Summary:
      Previously, automatic compaction could be triggered prior to the test invoking CompactRange(). It could lead to the following flaky failure:
      
      ```
      /root/project/db/db_block_cache_test.cc:753: Failure
      Expected equality of these values:
        1 + kNumBlocks
          Which is: 11
        options.statistics->getTickerCount(BLOCK_CACHE_INDEX_ADD)
          Which is: 10
      ```
      
      A sequence leading to this failure was:
      
      * Automatic compaction
        * files [1] [2] trivially moved
        * files [3] [4] [5] [6] trivially moved
      * CompactRange()
        * files [7] [8] [9] trivially moved
        * file [10] trivially moved
      
      In such a case, the index/filter block adds that the test expected did not happen since there were no new files.
      
      This PR just tweaks settings to ensure the `CompactRange()` produces one new file.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10635
      
      Reviewed By: cbi42
      
      Differential Revision: D39250869
      
      Pulled By: ajkr
      
      fbshipit-source-id: a3c94c49069e28c49c40b4b80dae0059739d19fd
      fe5fbe32
  12. 03 9月, 2022 1 次提交
    • C
      Skip swaths of range tombstone covered keys in merging iterator (2022 edition) (#10449) · 30bc495c
      Changyu Bi 提交于
      Summary:
      Delete range logic is moved from `DBIter` to `MergingIterator`, and `MergingIterator` will seek to the end of a range deletion if possible instead of scanning through each key and check with `RangeDelAggregator`.
      
      With the invariant that a key in level L (consider memtable as the first level, each immutable and L0 as a separate level) has a larger sequence number than all keys in any level >L, a range tombstone `[start, end)` from level L covers all keys in its range in any level >L. This property motivates optimizations in iterator:
      - in `Seek(target)`, if level L has a range tombstone `[start, end)` that covers `target.UserKey`, then for all levels > L, we can do Seek() on `end` instead of `target` to skip some range tombstone covered keys.
      - in `Next()/Prev()`, if the current key is covered by a range tombstone `[start, end)` from level L, we can do `Seek` to `end` for all levels > L.
      
      This PR implements the above optimizations in `MergingIterator`. As all range tombstone covered keys are now skipped in `MergingIterator`, the range tombstone logic is removed from `DBIter`. The idea in this PR is similar to https://github.com/facebook/rocksdb/issues/7317, but this PR leaves `InternalIterator` interface mostly unchanged. **Credit**: the cascading seek optimization and the sentinel key (discussed below) are inspired by [Pebble](https://github.com/cockroachdb/pebble/blob/master/merging_iter.go) and suggested by ajkr in https://github.com/facebook/rocksdb/issues/7317. The two optimizations are mostly implemented in `SeekImpl()/SeekForPrevImpl()` and `IsNextDeleted()/IsPrevDeleted()` in `merging_iterator.cc`. See comments for each method for more detail.
      
      One notable change is that the minHeap/maxHeap used by `MergingIterator` now contains range tombstone end keys besides point key iterators. This helps to reduce the number of key comparisons. For example, for a range tombstone `[start, end)`, a `start` and an `end` `HeapItem` are inserted into the heap. When a `HeapItem` for range tombstone start key is popped from the minHeap, we know this range tombstone becomes "active" in the sense that, before the range tombstone's end key is popped from the minHeap, all the keys popped from this heap is covered by the range tombstone's internal key range `[start, end)`.
      
      Another major change, *delete range sentinel key*, is made to `LevelIterator`. Before this PR, when all point keys in an SST file are iterated through in `MergingIterator`, a level iterator would advance to the next SST file in its level. In the case when an SST file has a range tombstone that covers keys beyond the SST file's last point key, advancing to the next SST file would lose this range tombstone. Consequently, `MergingIterator` could return keys that should have been deleted by some range tombstone. We prevent this by pretending that file boundaries in each SST file are sentinel keys. A `LevelIterator` now only advance the file iterator once the sentinel key is processed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10449
      
      Test Plan:
      - Added many unit tests in db_range_del_test
      - Stress test: `./db_stress --readpercent=5 --prefixpercent=19 --writepercent=20 -delpercent=10 --iterpercent=44 --delrangepercent=2`
      - Additional iterator stress test is added to verify against iterators against expected state: https://github.com/facebook/rocksdb/issues/10538. This is based on ajkr's previous attempt https://github.com/facebook/rocksdb/pull/5506#issuecomment-506021913.
      
      ```
      python3 ./tools/db_crashtest.py blackbox --simple --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152 --compression_type=none --max_background_compactions=8 --value_size_mult=33 --max_key=5000000 --interval=10 --duration=7200 --delrangepercent=3 --delpercent=9 --iterpercent=25 --writepercent=60 --readpercent=3 --prefixpercent=0 --num_iterations=1000 --range_deletion_width=100 --verify_iterator_with_expected_state_one_in=1
      ```
      
      - Performance benchmark: I used a similar setup as in the blog [post](http://rocksdb.org/blog/2018/11/21/delete-range.html) that introduced DeleteRange, "a database with 5 million data keys, and 10000 range tombstones (ignoring those dropped during compaction) that were written in regular intervals after 4.5 million data keys were written".  As expected, the performance with this PR depends on the range tombstone width.
      ```
      # Setup:
      TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=fillrandom --writes=4500000 --num=5000000
      TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=overwrite --writes=500000 --num=5000000 --use_existing_db=true --writes_per_range_tombstone=50
      
      # Scan entire DB
      TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=readseq[-X5] --use_existing_db=true --num=5000000 --disable_auto_compactions=true
      
      # Short range scan (10 Next())
      TEST_TMPDIR=/dev/shm/width-100/ ./db_bench_main --benchmarks=seekrandom[-X5] --use_existing_db=true --num=500000 --reads=100000 --seek_nexts=10 --disable_auto_compactions=true
      
      # Long range scan(1000 Next())
      TEST_TMPDIR=/dev/shm/width-100/ ./db_bench_main --benchmarks=seekrandom[-X5] --use_existing_db=true --num=500000 --reads=2500 --seek_nexts=1000 --disable_auto_compactions=true
      ```
      Avg over of 10 runs (some slower tests had fews runs):
      
      For the first column (tombstone), 0 means no range tombstone, 100-10000 means width of the 10k range tombstones, and 1 means there is a single range tombstone in the entire DB (width is 1000). The 1 tombstone case is to test regression when there's very few range tombstones in the DB, as no range tombstone is likely to take a different code path than with range tombstones.
      
      - Scan entire DB
      
      | tombstone width | Pre-PR ops/sec | Post-PR ops/sec | ±% |
      | ------------- | ------------- | ------------- |  ------------- |
      | 0 range tombstone    |2525600 (± 43564)    |2486917 (± 33698)    |-1.53%               |
      | 100   |1853835 (± 24736)    |2073884 (± 32176)    |+11.87%              |
      | 1000  |422415 (± 7466)      |1115801 (± 22781)    |+164.15%             |
      | 10000 |22384 (± 227)        |227919 (± 6647)      |+918.22%             |
      | 1 range tombstone      |2176540 (± 39050)    |2434954 (± 24563)    |+11.87%              |
      - Short range scan
      
      | tombstone width | Pre-PR ops/sec | Post-PR ops/sec | ±% |
      | ------------- | ------------- | ------------- |  ------------- |
      | 0  range tombstone   |35398 (± 533)        |35338 (± 569)        |-0.17%               |
      | 100   |28276 (± 664)        |31684 (± 331)        |+12.05%              |
      | 1000  |7637 (± 77)          |25422 (± 277)        |+232.88%             |
      | 10000 |1367                 |28667                |+1997.07%            |
      | 1 range tombstone      |32618 (± 581)        |32748 (± 506)        |+0.4%                |
      
      - Long range scan
      
      | tombstone width | Pre-PR ops/sec | Post-PR ops/sec | ±% |
      | ------------- | ------------- | ------------- |  ------------- |
      | 0 range tombstone     |2262 (± 33)          |2353 (± 20)          |+4.02%               |
      | 100   |1696 (± 26)          |1926 (± 18)          |+13.56%              |
      | 1000  |410 (± 6)            |1255 (± 29)          |+206.1%              |
      | 10000 |25                   |414                  |+1556.0%             |
      | 1 range tombstone   |1957 (± 30)          |2185 (± 44)          |+11.65%              |
      
      - Microbench does not show significant regression: https://gist.github.com/cbi42/59f280f85a59b678e7e5d8561e693b61
      
      Reviewed By: ajkr
      
      Differential Revision: D38450331
      
      Pulled By: cbi42
      
      fbshipit-source-id: b5ef12e8d8c289ed2e163ccdf277f5039b511fca
      30bc495c
  13. 02 9月, 2022 4 次提交
  14. 01 9月, 2022 4 次提交