1. 26 10月, 2022 7 次提交
    • S
      Run clang format against files under tools/ and db_stress_tool/ (#10868) · 48fe9217
      sdong 提交于
      Summary:
      Some lines of .h and .cc files are not properly fomatted. Clear them up with clang format.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10868
      
      Test Plan: Watch existing CI to pass
      
      Reviewed By: ajkr
      
      Differential Revision: D40683485
      
      fbshipit-source-id: 491fbb78b2cdcb948164f306829909ad816d5d0b
      48fe9217
    • Y
      Run clang-format on utilities/transactions (#10871) · 95a1935c
      Yanqin Jin 提交于
      Summary:
      This PR is the result of running the following command
      ```
      find ./utilities/transactions/ -name '*.cc' -o -name '*.h' -o -name '*.c' -o -name '*.hpp' -o -name '*.cpp' | xargs clang-format -i
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10871
      
      Test Plan: make check
      
      Reviewed By: cbi42
      
      Differential Revision: D40686871
      
      Pulled By: riversand963
      
      fbshipit-source-id: 613738d667ec8f8e13cce4802e0e166d6be52211
      95a1935c
    • Y
      Run clang-format on some files in db/db_impl directory (#10869) · 84563a27
      Yanqin Jin 提交于
      Summary:
      Run clang-format on some files in db/db_impl/ directory
      
      ```
      clang-format -i <file>
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10869
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D40685390
      
      Pulled By: riversand963
      
      fbshipit-source-id: 64449ccb21b0d61c5142eb2bcbff828acb45c154
      84563a27
    • A
      Format files under table/ by clang-format (#10852) · 727bad78
      anand76 提交于
      Summary:
      Run clang-format on files under the `table` directory.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10852
      
      Reviewed By: ajkr
      
      Differential Revision: D40650732
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2023a958e37fd6274040c5181130284600c9e0ef
      727bad78
    • C
      Improve FragmentTombstones() speed by lazily initializing `seq_set_` (#10848) · 7a959388
      Changyu Bi 提交于
      Summary:
      FragmentedRangeTombstoneList has a member variable `seq_set_` that contains the sequence numbers of all range tombstones in a set. The set is constructed in `FragmentTombstones()` and is used only in `FragmentedRangeTombstoneList::ContainsRange()` which only happens during compaction. This PR moves the initialization of `seq_set_` to `FragmentedRangeTombstoneList::ContainsRange()`. This should speed up `FragmentTombstones()` when the range tombstone list is used for read/scan requests. Microbench shows the speed improvement to be ~45%.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10848
      
      Test Plan:
      - Existing tests and stress test: `python3 tools/db_crashtest.py whitebox --simple  --verify_iterator_with_expected_state_one_in=5`.
      - Microbench: update `range_del_aggregator_bench` to benchmark speed of `FragmentTombstones()`:
      ```
      ./range_del_aggregator_bench --num_range_tombstones=1000 --tombstone_start_upper_bound=50000000 --num_runs=10000 --tombstone_width_mean=200 --should_deletes_per_run=100 --use_compaction_range_del_aggregator=true
      
      Before this PR:
      =========================
      Fragment Tombstones:     270.286 us
      AddTombstones:           1.28933 us
      ShouldDelete (first):    0.525528 us
      ShouldDelete (rest):     0.0797519 us
      
      After this PR: time to fragment tombstones is pushed to AddTombstones() which only happen during compaction.
      =========================
      Fragment Tombstones:     149.879 us
      AddTombstones:           102.131 us
      ShouldDelete (first):    0.565871 us
      ShouldDelete (rest):     0.0729444 us
      ```
      - db_bench: this should improve speed for fragmenting range tombstones for mutable memtable:
      ```
      ./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=100 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=250000 --disable_auto_compactions --max_num_range_tombstones=100000 --finish_after_writes --write_buffer_size=1073741824 --threads=25
      
      Before this PR:
      readwhilewriting :      18.301 micros/op 1310445 ops/sec 4.769 seconds 6250000 operations;   28.1 MB/s (41001 of 250000 found)
      After this PR:
      readwhilewriting :      16.943 micros/op 1439376 ops/sec 4.342 seconds 6250000 operations;   23.8 MB/s (28977 of 250000 found)
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D40646227
      
      Pulled By: cbi42
      
      fbshipit-source-id: ea471667edb258f67d01cfd828588e80a89e4083
      7a959388
    • H
      Fix FIFO causing overlapping seqnos in L0 files due to overlapped seqnos... · fc74abb4
      Hui Xiao 提交于
      Fix FIFO causing overlapping seqnos in L0 files due to overlapped seqnos between ingested files and memtable's (#10777)
      
      Summary:
      **Context:**
      Same as https://github.com/facebook/rocksdb/pull/5958#issue-511150930 but apply the fix to FIFO Compaction case
      Repro:
      ```
      COERCE_CONTEXT_SWICH=1 make -j56 db_stress
      
      ./db_stress --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=18 --bottommost_compression_type=disable --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=1000 --compaction_pri=3 --open_files=-1 --compaction_style=2 --fifo_allow_compaction=1 --compaction_ttl=0 --compression_max_dict_buffer_bytes=8388607 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zlib --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test0/rocksdb_crashtest_whitebox --db_write_buffer_size=8388608 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=15 --index_type=3 --ingest_external_file_one_in=100 --initial_auto_readahead_size=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --log2_keys_per_lock=10 --long_running_snapshots=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=16384 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=4194304 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --num_levels=1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=1 --reopen=20 --ribbon_starting_level=999 --snapshot_hold_ops=1000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=3 --unpartitioned_pinning=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=0 --use_multiget=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=0 --verify_db_one_in=1000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=524288 --write_dbid_to_manifest=0 --writepercent=35
      
      put or merge error: Corruption: force_consistency_checks(DEBUG): VersionBuilder: L0 file https://github.com/facebook/rocksdb/issues/479 with seqno 23711 29070 vs. file https://github.com/facebook/rocksdb/issues/482 with seqno 27138 29049
      ```
      
      **Summary:**
      FIFO only does intra-L0 compaction in the following four cases. For other cases, FIFO drops data instead of compacting on data, which is irrelevant to the overlapping seqno issue we are solving.
      -  [FIFOCompactionPicker::PickSizeCompaction](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L155) when `total size < compaction_options_fifo.max_table_files_size` and `compaction_options_fifo.allow_compaction == true`
         - For this path, we simply reuse the fix in `FindIntraL0Compaction` https://github.com/facebook/rocksdb/pull/5958/files#diff-c261f77d6dd2134333c4a955c311cf4a196a08d3c2bb6ce24fd6801407877c89R56
         - This path was not stress-tested at all. Therefore we covered `fifo.allow_compaction` in stress test to surface the overlapping seqno issue we are fixing here.
      - [FIFOCompactionPicker::PickCompactionToWarm](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L313) when `compaction_options_fifo.age_for_warm > 0`
        - For this path, we simply replicate the idea in https://github.com/facebook/rocksdb/pull/5958#issue-511150930 and skip files of largest seqno greater than `earliest_mem_seqno`
        - This path was not stress-tested at all. However covering `age_for_warm` option worths a separate PR to deal with db stress compatibility. Therefore we manually tested this path for this PR
      - [FIFOCompactionPicker::CompactRange](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L365) that ends up picking one of the above two compactions
      - [CompactionPicker::CompactFiles](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker.cc#L378)
          - Since `SanitizeCompactionInputFiles()` will be called [before](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker.h#L111-L113) `CompactionPicker::CompactFiles` , we simply replicate the idea in https://github.com/facebook/rocksdb/pull/5958#issue-511150930  in `SanitizeCompactionInputFiles()`. To simplify implementation, we return `Stats::Abort()` on encountering seqno-overlapped file when doing compaction to L0 instead of skipping the file and proceed with the compaction.
      
      Some additional clean-up included in this PR:
      - Renamed `earliest_memtable_seqno` to `earliest_mem_seqno` for consistent naming
      - Added comment about `earliest_memtable_seqno` in related APIs
      - Made parameter `earliest_memtable_seqno` constant and required
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10777
      
      Test Plan:
      - make check
      - New unit test `TEST_P(DBCompactionTestFIFOCheckConsistencyWithParam, FlushAfterIntraL0CompactionWithIngestedFile)`corresponding to the above 4 cases, which will fail accordingly without the fix
      - Regular CI stress run on this PR + stress test with aggressive value https://github.com/facebook/rocksdb/pull/10761  and on FIFO compaction only
      
      Reviewed By: ajkr
      
      Differential Revision: D40090485
      
      Pulled By: hx235
      
      fbshipit-source-id: 52624186952ee7109117788741aeeac86b624a4f
      fc74abb4
    • S
      Run format check for *.h and *.cc files under java/ (#10851) · 2a551976
      sdong 提交于
      Summary:
      Run format check for .h and .cc files to clean the format
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10851
      
      Test Plan: Watch CI tests to pass
      
      Reviewed By: ajkr
      
      Differential Revision: D40649723
      
      fbshipit-source-id: 62d32cead0b3b8e6540e86d25451bd72642109eb
      2a551976
  2. 25 10月, 2022 11 次提交
  3. 24 10月, 2022 2 次提交
  4. 23 10月, 2022 1 次提交
  5. 22 10月, 2022 7 次提交
    • J
      Allow penultimate level output for the last level only compaction (#10822) · f726d29a
      Jay Zhuang 提交于
      Summary:
      Allow the last level only compaction able to output result to penultimate level if the penultimate level is empty. Which will also block the other compaction output to the penultimate level.
      (it includes the PR https://github.com/facebook/rocksdb/issues/10829)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10822
      
      Reviewed By: siying
      
      Differential Revision: D40389180
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 4e5dcdce307795b5e07b5dd1fa29dd75bb093bad
      f726d29a
    • P
      Use kXXH3 as default checksum (CPU efficiency) (#10778) · 27c9705a
      Peter Dillinger 提交于
      Summary:
      Since this has been supported for about a year, I think it's time to make it the default. This should improve CPU efficiency slightly on most hardware.
      
      A current DB performance comparison using buck+clang build:
      ```
      TEST_TMPDIR=/dev/shm ./db_bench -checksum_type={1,4} -benchmarks=fillseq[-X1000] -num=3000000 -disable_wal
      ```
      kXXH3 (+0.2% DB write throughput):
      `fillseq [AVG    1000 runs] : 822149 (± 1004) ops/sec;   91.0 (± 0.1) MB/sec`
      kCRC32c:
      `fillseq [AVG    1000 runs] : 820484 (± 1203) ops/sec;   90.8 (± 0.1) MB/sec`
      
      Micro benchmark comparison:
      ```
      ./db_bench --benchmarks=xxh3[-X20],crc32c[-X20]
      ```
      Machine 1, buck+clang build:
      `xxh3 [AVG    20 runs] : 3358616 (± 19091) ops/sec; 13119.6 (± 74.6) MB/sec`
      `crc32c [AVG    20 runs] : 2578725 (± 7742) ops/sec; 10073.1 (± 30.2) MB/sec`
      
      Machine 2, make+gcc build, DEBUG_LEVEL=0 PORTABLE=0:
      `xxh3 [AVG    20 runs] : 6182084 (± 137223) ops/sec; 24148.8 (± 536.0) MB/sec`
      `crc32c [AVG    20 runs] : 5032465 (± 42454) ops/sec; 19658.1 (± 165.8) MB/sec`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10778
      
      Test Plan: make check, unit tests updated
      
      Reviewed By: ajkr
      
      Differential Revision: D40112510
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e59a8d50a60346137732f8668ba7cfac93be2b37
      27c9705a
    • S
      Make UserComparatorWrapper not Customizable (#10837) · 5d17297b
      sdong 提交于
      Summary:
      Right now UserComparatorWrapper is a Customizable object, although it is not, which introduces some intialization overhead for the object. In some benchmarks, it shows up in CPU profiling. Make it not configurable by defining most functions needed by UserComparatorWrapper to an interface and implement the interface.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10837
      
      Test Plan: Make sure existing tests pass
      
      Reviewed By: pdillinger
      
      Differential Revision: D40528511
      
      fbshipit-source-id: 70eaac89ecd55401a26e8ed32abbc413a9617c62
      5d17297b
    • A
      Refactor block cache tracing APIs (#10811) · 0e7b27bf
      akankshamahajan 提交于
      Summary:
      Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Currently, only a TraceWriter is supported, with a default built-in implementation of FileTraceWriter. The TraceWriter, however, takes a flat trace record and is thus only suitable for file tracing. This PR introduces an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter.
      
      `DB::StartBlockTrace` will internally redirect to changed `BlockCacheTrace::StartBlockCacheTrace`.
      New API `DB::StartBlockTrace` is also added that directly takes `BlockCacheTraceWriter` pointer.
      
      This same philosophy can be applied to KV and IO tracing as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10811
      
      Test Plan:
      existing unit tests
      Old API DB::StartBlockTrace checked with db_bench tool
      create database
      ```
      ./db_bench --benchmarks="fillseq" \
      --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \
      --cache_index_and_filter_blocks --cache_size=1048576 \
      --disable_auto_compactions=1 --disable_wal=1 --compression_type=none \
      --min_level_to_compress=-1 --compression_ratio=1 --num=10000000
      ```
      
      To trace block cache accesses when running readrandom benchmark:
      ```
      ./db_bench --benchmarks="readrandom" --use_existing_db --duration=60 \
      --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \
      --cache_index_and_filter_blocks --cache_size=1048576 \
      --disable_auto_compactions=1 --disable_wal=1 --compression_type=none \
      --min_level_to_compress=-1 --compression_ratio=1 --num=10000000 \
      --threads=16 \
      -block_cache_trace_file="/tmp/binary_trace_test_example" \
      -block_cache_trace_max_trace_file_size_in_bytes=1073741824 \
      -block_cache_trace_sampling_frequency=1
      
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D40435289
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: fa2755f4788185e19f4605e731641cfd21ab3282
      0e7b27bf
    • P
      Fix HyperClockCache Rollback bug in #10801 (#10843) · b6e33dbc
      Peter Dillinger 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/10801 in ClockHandleTable::Evict, we saved a reference to the hash value (`const UniqueId64x2& hashed_key`) instead of saving the hash value itself before marking the handle as empty and thus free for use by other threads. This could lead to Rollback seeing the wrong hash value for updating the `displacements` after an entry is removed.
      
      The fix is (like other places) to copy the hash value before it's released. (We could Rollback while we own the entry, but that creates more dependences between atomic updates, because in that case, based on the code, the Rollback writes would have to happen before or after the entry is released by marking empty. By doing the relaxed Rollback after marking empty, there's more opportunity for re-ordering / ILP.)
      
      Intended follow-up: refactoring for better code sharing in clock_cache.cc
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10843
      
      Test Plan: watch for clean crash test, TSAN
      
      Reviewed By: siying
      
      Differential Revision: D40579680
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 258e43b3b80bc980a161d5c675ccc6708ecb8025
      b6e33dbc
    • C
      Ignore max_compaction_bytes for compaction input that are within output key-range (#10835) · 333abe9c
      Changyu Bi 提交于
      Summary:
      When picking compaction input files, we sometimes stop picking a file that is fully included in the output key-range due to hitting max_compaction_bytes. Including these input files can potentially reduce WA at the expense of larger compactions. Larger compaction should be fine as files from input level are usually 10X smaller than files from output level. This PR adds a mutable CF option `ignore_max_compaction_bytes_for_input` that is enabled by default. We can remove this option once we are sure it is safe.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10835
      
      Test Plan:
      - CI, a unit test on max_compaction_bytes fails before turning this flag off.
      - Benchmark does not show much difference in WA: `./db_bench --benchmarks=fillrandom,waitforcompaction,stats,levelstats -max_background_jobs=12 -num=2000000000 -target_file_size_base=33554432 --write_buffer_size=33554432`
      ```
      main:
      ** Compaction Stats [default] **
      Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0      3/0   91.59 MB   0.8     70.9     0.0     70.9     200.8    129.9       0.0   1.5     25.2     71.2   2886.55           2463.45      9725    0.297   1093M   254K       0.0       0.0
        L1      9/0   248.03 MB   1.0    392.0   129.8    262.2     391.7    129.5       0.0   3.0     69.0     68.9   5821.71           5536.90       804    7.241   6029M  5814K       0.0       0.0
        L2     87/0    2.50 GB   1.0    537.0   128.5    408.5     533.8    125.2       0.7   4.2     69.5     69.1   7912.24           7323.70      4417    1.791   8299M    36M       0.0       0.0
        L3    836/0   24.99 GB   1.0    616.9   118.3    498.7     594.5     95.8       5.2   5.0     66.9     64.5   9442.38           8490.28      4204    2.246   9749M   306M       0.0       0.0
        L4   2355/0   62.95 GB   0.3     67.3    37.1     30.2      54.2     24.0      38.9   1.5     72.2     58.2    954.37            821.18       917    1.041   1076M   173M       0.0       0.0
       Sum   3290/0   90.77 GB   0.0   1684.2   413.7   1270.5    1775.0    504.5      44.9  13.7     63.8     67.3  27017.25          24635.52     20067    1.346     26G   522M       0.0       0.0
      
      Cumulative compaction: 1774.96 GB write, 154.29 MB/s write, 1684.19 GB read, 146.40 MB/s read, 27017.3 seconds
      
      This PR:
      ** Compaction Stats [default] **
      Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0      3/0   45.71 MB   0.8     72.9     0.0     72.9     202.8    129.9       0.0   1.6     25.4     70.7   2938.16           2510.36      9741    0.302   1124M   265K       0.0       0.0
        L1      8/0   234.54 MB   0.9    384.5   129.8    254.7     384.2    129.6       0.0   3.0     69.0     68.9   5708.08           5424.43       791    7.216   5913M  5753K       0.0       0.0
        L2     84/0    2.47 GB   1.0    543.1   128.6    414.5     539.9    125.4       0.7   4.2     69.6     69.2   7989.31           7403.13      4418    1.808   8393M    36M       0.0       0.0
        L3    839/0   24.96 GB   1.0    615.6   118.4    497.2     593.2     96.0       5.1   5.0     66.6     64.1   9471.23           8489.31      4193    2.259   9726M   306M       0.0       0.0
        L4   2360/0   63.04 GB   0.3     67.6    37.3     30.3      54.4     24.1      38.9   1.5     71.5     57.6    967.30            827.99       907    1.066   1080M   173M       0.0       0.0
       Sum   3294/0   90.75 GB   0.0   1683.8   414.2   1269.6    1774.5    504.9      44.8  13.7     63.7     67.1  27074.08          24655.22     20050    1.350     26G   522M       0.0       0.0
      
      Cumulative compaction: 1774.52 GB write, 157.09 MB/s write, 1683.77 GB read, 149.06 MB/s read, 27074.1 seconds
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D40518319
      
      Pulled By: cbi42
      
      fbshipit-source-id: f4ea614bc0ebefe007ffaf05bb9aec9a8ca25b60
      333abe9c
    • L
      Separate the handling of value types in SaveValue (#10840) · 8dd4bf6c
      Levi Tamasi 提交于
      Summary:
      Currently, the code in `SaveValue` that handles `kTypeValue` and
      `kTypeBlobIndex` (and more recently, `kTypeWideColumnEntity`) is
      mostly shared. This made sense originally; however, by now the
      handling of these three value types has diverged significantly. The
      patch makes the logic cleaner and also eliminates quite a bit of branching
      by giving each value type its own `case` and removing a fall-through.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10840
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D40568420
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2e614606afd1c3d9c76d9b5f1efa0959fc174103
      8dd4bf6c
  6. 21 10月, 2022 4 次提交
  7. 19 10月, 2022 2 次提交
    • P
      Refactor ShardedCache for more sharing, static polymorphism (#10801) · 7555243b
      Peter Dillinger 提交于
      Summary:
      The motivations for this change include
      * Free up space in ClockHandle so that we can add data for secondary cache handling while still keeping within single cache line (64 byte) size.
        * This change frees up space by eliminating the need for the `hash` field by making the fixed-size key itself a hash, using a 128-bit bijective (lossless) hash.
      * Generally more customizability of ShardedCache (such as hashing) without worrying about virtual call overheads
        * ShardedCache now uses static polymorphism (template) instead of dynamic polymorphism (virtual overrides) for the CacheShard. No obvious performance benefit is seen from the change (as mostly expected; most calls to virtual functions in CacheShard could already be optimized to static calls), but offers more flexibility without incurring the runtime cost of adhering to a common interface (without type parameters or static callbacks).
        * You'll also notice less `reinterpret_cast`ing and other boilerplate in the Cache implementations, as this can go in ShardedCache.
      
      More detail:
      * Don't have LRUCacheShard maintain `std::shared_ptr<SecondaryCache>` copies (extra refcount) when LRUCache can be in charge of keeping a `shared_ptr`.
      * Renamed `capacity_mutex_` to `config_mutex_` to better represent the scope of what it guards.
      * Some preparation for 64-bit hash and indexing in LRUCache, but didn't include the full change because of slight performance regression.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10801
      
      Test Plan:
      Unit test updates were non-trivial because of major changes to the ClockCacheShard interface in handling of key vs. hash.
      
      Performance:
      Create with `TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16`
      
      Test with
      ```
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X1000] -readonly -num=30000000 -bloom_bits=16 -cache_index_and_filter_blocks=1 -cache_size=610000000 -duration 20 -threads=16
      ```
      
      Before: `readrandom [AVG 150 runs] : 321147 (± 253) ops/sec`
      After: `readrandom [AVG 150 runs] : 321530 (± 326) ops/sec`
      
      So possibly ~0.1% improvement.
      
      And with `-cache_type=hyper_clock_cache`:
      Before: `readrandom [AVG 30 runs] : 614126 (± 7978) ops/sec`
      After: `readrandom [AVG 30 runs] : 645349 (± 8087) ops/sec`
      
      So roughly 5% improvement!
      
      Reviewed By: anand1976
      
      Differential Revision: D40252236
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ff8fc70ef569585edc95bcbaaa0386f61355ae5b
      7555243b
    • Y
      Enable a multi-level db to smoothly migrate to FIFO via DB::Open (#10348) · e267909e
      Yueh-Hsuan Chiang 提交于
      Summary:
      FIFO compaction can theoretically open a DB with any compaction style.
      However, the current code only allows FIFO compaction to open a DB with
      a single level.
      
      This PR relaxes the limitation of FIFO compaction and allows it to open a
      DB with multiple levels.  Below is the read / write / compaction behavior:
      
      * The read behavior is untouched, and it works like a regular rocksdb instance.
      * The write behavior is untouched as well.  When a FIFO compacted DB
      is opened with multiple levels, all new files will still be in level 0, and no files
      will be moved to a different level.
      * Compaction logic is extended.  It will first identify the bottom-most non-empty level.
      Then, it will delete the oldest file in that level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10348
      
      Test Plan:
      Added a new test to verify the migration from level to FIFO where the db has multiple levels.
      Extended existing test cases in db_test and db_basic_test to also verify
      all entries of a key after reopening the DB with FIFO compaction.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D40233744
      
      fbshipit-source-id: 6cc011d6c3467e6bfb9b6a4054b87619e69815e1
      e267909e
  8. 18 10月, 2022 3 次提交
    • P
      Print stack traces on frozen tests in CI (#10828) · e466173d
      Peter Dillinger 提交于
      Summary:
      Instead of existing calls to ps from gnu_parallel, call a new wrapper that does ps, looks for unit test like processes, and uses pstack or gdb to print thread stack traces. Also, using `ps -wwf` instead of `ps -wf` ensures output is not cut off.
      
      For security, CircleCI runs with security restrictions on ptrace (/proc/sys/kernel/yama/ptrace_scope = 1), and this change adds a work-around to `InstallStackTraceHandler()` (only used by testing tools) to allow any process from the same user to debug it. (I've also touched >100 files to ensure all the unit tests call this function.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10828
      
      Test Plan: local manual + temporary infinite loop in a unit test to observe in CircleCI
      
      Reviewed By: hx235
      
      Differential Revision: D40447634
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 718a4c4a5b54fa0f9af2d01a446162b45e5e84e1
      e466173d
    • P
      Improve / refactor anonymous mmap capabilities (#10810) · 8367f0d2
      Peter Dillinger 提交于
      Summary:
      The motivation for this change is a planned feature (related to HyperClockCache) that will depend on a large array that can essentially grow automatically, up to some bound, without the pointer address changing and with guaranteed zero-initialization of the data. Anonymous mmaps provide such functionality, and this change provides an internal API for that.
      
      The other existing use of anonymous mmap in RocksDB is for allocating in huge pages. That code and other related Arena code used some awkward non-RAII and pre-C++11 idioms, so I cleaned up much of that as well, with RAII, move semantics, constexpr, etc.
      
      More specifcs:
      * Minimize conditional compilation
      * Add Windows support for anonymous mmaps
      * Use std::deque instead of std::vector for more efficient bag
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10810
      
      Test Plan: unit test added for new functionality
      
      Reviewed By: riversand963
      
      Differential Revision: D40347204
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ca83fcc47e50fabf7595069380edd2954f4f879c
      8367f0d2
    • L
      Do not adjust test_batches_snapshots to avoid mixing runs (#10830) · 11c0d131
      Levi Tamasi 提交于
      Summary:
      This is a small follow-up to https://github.com/facebook/rocksdb/pull/10821. The goal of that PR was to hold `test_batches_snapshots` fixed across all `db_stress` invocations; however, that patch didn't address the case when `test_batches_snapshots` is unset due to a conflicting `enable_compaction_filter` or `prefix_size` setting. This PR updates the logic so the other parameter is sanitized instead in the case of such conflicts.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10830
      
      Reviewed By: riversand963
      
      Differential Revision: D40444548
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 0331265704904b729262adec37139292fcbb7805
      11c0d131
  9. 17 10月, 2022 2 次提交
  10. 15 10月, 2022 1 次提交