1. 23 8月, 2023 1 次提交
    • C
      Do not drop unsynced data during reopen in stress test (#11731) · 5e0584bd
      Changyu Bi 提交于
      Summary:
      Currently the stress test does not support restoring expected state (to a specific sequence number) when there is unsynced data loss during the reopen phase. This causes a few internal stress test failure with errors like inconsistent value. This PR disables dropping unsynced data during reopen to avoid failures due to this issue. We can re-enable later after we decide to support unsynced data loss during DB reopen in stress test.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11731
      
      Test Plan:
      * Running this test a few times can fail for inconsistent value before this change
      ```
      ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_bits=20.57166126835524 --bottommost_compression_type=disable --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_style=1 --compaction_ttl=100 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --enable_thread_tracking=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=3 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=6 --index_type=3 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=1000000 --log2_keys_per_lock=10 --long_running_snapshots=1 --manual_wal_flush_one_in=1000000 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_max_range_deletions=100 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=5 --open_write_fault_one_in=0 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=10 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=524288 --readpercent=50 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=0 --secondary_cache_fault_one_in=32 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --subcompactions=3 --sync=0 --sync_fault_injection=1 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35```
      
      Reviewed By: hx235
      
      Differential Revision: D48537494
      
      Pulled By: cbi42
      
      fbshipit-source-id: ddae21b9bb6ee8d67229121f58513e95f7ef6d8d
      5e0584bd
  2. 22 8月, 2023 5 次提交
    • Y
      Try to use a db's OPTIONS file for some ldb commands (#11721) · 2a9f3b6c
      Yu Zhang 提交于
      Summary:
      For some ldb commands that doesn't need to open the DB, it's still useful to use the DB's existing OPTIONS file if it's available.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11721
      
      Reviewed By: pdillinger
      
      Differential Revision: D48485540
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 2d2db837523044066f1a2c4b59a5c03f6cd35e6b
      2a9f3b6c
    • A
      Update HISTORY.md and version.h for 8.6 (#11728) · 4b535207
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11728
      
      Reviewed By: jaykorean, jowlyzhang
      
      Differential Revision: D48527100
      
      Pulled By: anand1976
      
      fbshipit-source-id: c48baa44e538fb6bfd3fe7f19046746d3540763f
      4b535207
    • J
      Replace existing waitforcompaction with new WaitForCompact API in db_bench_tool (#11727) · 4fa2c017
      Jay Huh 提交于
      Summary:
      As the new API to wait for compaction is available (https://github.com/facebook/rocksdb/issues/11436), we can now replace the existing logic of waiting in db_bench_tool with the new API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11727
      
      Test Plan:
      ```
      ./db_bench --benchmarks="fillrandom,compactall,waitforcompaction,readrandom"
      ```
      **Before change**
      ```
      Set seed to 1692635571470041 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 8.6.0
      Date:       Mon Aug 21 09:33:40 2023
      CPU:        80 * Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
      CPUCache:   28160 KB
      Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      WARNING: Optimization is disabled: benchmarks unnecessarily slow
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      fillrandom   :      51.826 micros/op 19295 ops/sec 51.826 seconds 1000000 operations;    2.1 MB/s
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      readrandom   :      39.042 micros/op 25613 ops/sec 39.042 seconds 1000000 operations;    1.8 MB/s (632886 of 1000000 found)
      ```
      **After change**
      ```
      Set seed to 1692636574431745 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 8.6.0
      Date:       Mon Aug 21 09:49:34 2023
      CPU:        80 * Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
      CPUCache:   28160 KB
      Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      WARNING: Optimization is disabled: benchmarks unnecessarily slow
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      fillrandom   :      51.271 micros/op 19504 ops/sec 51.271 seconds 1000000 operations;    2.2 MB/s
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished with status (OK)
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      readrandom   :      39.264 micros/op 25468 ops/sec 39.264 seconds 1000000 operations;    1.8 MB/s (632921 of 1000000 found)
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D48524667
      
      Pulled By: jaykorean
      
      fbshipit-source-id: 1052a15b2ed79a35165ec4d9998d0454b2552ef4
      4fa2c017
    • Y
      Add unit test for default temperature (#11722) · 03a74411
      Yu Zhang 提交于
      Summary:
      This piggy back the existing last level file temperature statistics test to test the default temperature becoming effective.
      
      While adding this unit test, I found that the approach to swap out and use default temperature in `VersionBuilder::LoadTableHandlers` will miss the L0 files created from flush, and only work for existing SST files, SST files created by compaction. So this PR moves that logic to `TableCache::GetTableReader`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11722
      
      Test Plan:
      ```
      ./db_test2 --gtest_filter="*LastLevelStatistics*"
      make all check
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D48489171
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: ac29f7d484916f3218729594c5bb35c4f2979ac2
      03a74411
    • L
      Circleci macos sunset (#11633) · a9770b18
      Levi Tamasi 提交于
      Summary:
      [draft] this PR is created in order to test CI changes
      Closes: https://github.com/facebook/rocksdb/pull/11543
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11633
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D48525552
      
      Pulled By: cbi42
      
      fbshipit-source-id: 758d57f248304213228af459789459cc2f0bf419
      a9770b18
  3. 19 8月, 2023 6 次提交
    • H
      Improve PrefetchTest.Basic with explicit flush and file num variable (#11720) · f53018c0
      Hui Xiao 提交于
      Summary:
      **Context/Summary:** as title, should be harmless. And it's a guessed fix to https://github.com/facebook/rocksdb/issues/11717 while no repro has obtained on my end yet.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11720
      
      Test Plan: existing tests
      
      Reviewed By: cbi42
      
      Differential Revision: D48475661
      
      Pulled By: hx235
      
      fbshipit-source-id: 7c7390319f094c540e703fe2e78a8d601b7a894b
      f53018c0
    • A
      Implement trimming of readhead size when upper bound is specified (#11684) · f65a0379
      akankshamahajan 提交于
      Summary:
      Implement trimming of readahead_size under a new option ReadOptions.auto_readahead_size. It'll trim the readahead_size during prefetching upto iterate_upper_bound offset only when ReadOptions.iterate_upper_bound is set, therefore reducing the prefetching of data beyond upper_bound.
      It's enabled for both implicit auto readahead size and when ReadOptions.readahead_size is specified and for sync and async_io.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11684
      
      Test Plan: Added new unit test
      
      Reviewed By: anand1976
      
      Differential Revision: D48479723
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 2b1703579caf779105e836b580866ffd7db076fc
      f65a0379
    • C
      Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666) · c2aad555
      Changyu Bi 提交于
      Summary:
      Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
      * User can set CompressionOptions::checksum to true to enable this feature.
      * We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
      * Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
      * Error handling during decompression to propagate error message from ZSTD.
      * Updated microbench to test read performance impact.
      
      About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
      
      Test Plan:
      * Existing unit tests for zstd compression
      * Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
      * Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
      * Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
      * Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
      * Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
      * Microbench:
      ```
      TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
      
      Min out of 100 runs:
      Main:
      10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
      
      After this PR, checksum=false
      10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
      
      After this PR, checksum=true
      10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
      ```
      * db_bench:
      ```
      Write perf
      TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
      
      [FillSeq checksum=0]
      fillseq [AVG    10 runs] : 281635 (± 31711) ops/sec;   31.2 (± 3.5) MB/sec
      fillseq [MEDIAN 10 runs] : 294027 ops/sec;   32.5 MB/sec
      
      [FillSeq checksum=1]
      fillseq [AVG    10 runs] : 286961 (± 34700) ops/sec;   31.7 (± 3.8) MB/sec
      fillseq [MEDIAN 10 runs] : 283278 ops/sec;   31.3 MB/sec
      
      Read perf
      TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
      
      [Readrandom checksum=1]
      readrandom [AVG    20 runs] : 360928 (± 3579) ops/sec;    4.0 (± 0.0) MB/sec
      readrandom [MEDIAN 20 runs] : 362468 ops/sec;    4.0 MB/sec
      
      [Readrandom checksum=0]
      readrandom [AVG    20 runs] : 380365 (± 2384) ops/sec;    4.2 (± 0.0) MB/sec
      readrandom [MEDIAN 20 runs] : 379800 ops/sec;    4.2 MB/sec
      
      Compression
      TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
      
      checksum=1
      compress [AVG    20 runs] : 54074 (± 634) ops/sec;  211.2 (± 2.5) MB/sec
      compress [MEDIAN 20 runs] : 54396 ops/sec;  212.5 MB/sec
      
      checksum=0
      compress [AVG    20 runs] : 54598 (± 393) ops/sec;  213.3 (± 1.5) MB/sec
      compress [MEDIAN 20 runs] : 54592 ops/sec;  213.3 MB/sec
      
      Decompression:
      TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
      
      checksum = 0
      uncompress [AVG    20 runs] : 167499 (± 962) ops/sec;  654.3 (± 3.8) MB/sec
      uncompress [MEDIAN 20 runs] : 167210 ops/sec;  653.2 MB/sec
      checksum = 1
      uncompress [AVG    20 runs] : 167980 (± 924) ops/sec;  656.2 (± 3.6) MB/sec
      uncompress [MEDIAN 20 runs] : 168465 ops/sec;  658.1 MB/sec
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D48019378
      
      Pulled By: cbi42
      
      fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
      c2aad555
    • J
      Timeout in microsecond option in WaitForCompactOptions (#11711) · 0fa0c97d
      Jay Huh 提交于
      Summary:
      While it's rare, we may run into a scenario where `WaitForCompact()` waits for background jobs indefinitely. For example, not enough space error will add the job back to the queue while WaitForCompact() waits for _all jobs_ including the jobs that are in the queue to be completed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11711
      
      Test Plan:
      `DBCompactionWaitForCompactTest::WaitForCompactToTimeout` added
      `timeout` option added to the variables for all of the existing DBCompactionWaitForCompactTests
      
      Reviewed By: pdillinger, jowlyzhang
      
      Differential Revision: D48416390
      
      Pulled By: jaykorean
      
      fbshipit-source-id: 7b6a12f705ab6c6dfaf8ad736a484ca654a86106
      0fa0c97d
    • A
      Implement a allow cache hits admission policy for the compressed secondary cache (#11713) · a1743e85
      anand76 提交于
      Summary:
      This PR implements a new admission policy for the compressed secondary cache, which includes the functionality of the existing policy, and also admits items evicted from the primary block cache with the hit bit set. Effectively, the new policy works as follows -
      1. When an item is demoted from the primary cache without a hit, a placeholder is inserted in the compressed cache. A second demotion will insert the full entry.
      2. When an item is promoted from the compressed cache to the primary cache for the first time, a placeholder is inserted in the primary. The second promotion inserts the full entry, while erasing it form the compressed cache.
      3. If an item is demoted from the primary cache with the hit bit set, it is immediately inserted in the compressed secondary cache.
      The ```TieredVolatileCacheOptions``` has been updated with a new option, ```adm_policy```, which allows the policy to be selected.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11713
      
      Reviewed By: pdillinger
      
      Differential Revision: D48444512
      
      Pulled By: anand1976
      
      fbshipit-source-id: b4cbf8c169a88097dff08e36e8bc4b3088de1492
      a1743e85
    • H
      Explicitly instantiate MaybeReadBlockAndLoadToCache as well (#11714) · a67ef998
      Han Zhu 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11714
      
      Fixes T161017540.
      
      The staging build starts failing with an undefined symbol error:
      ```
      ld.lld: error: undefined symbol: std::enable_if<rocksdb::ParsedFullFilterBlock::kCacheEntryRole == (rocksdb::CacheEntryRole)13 || true, rocksdb::Status>::type rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, bool, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*, bool) const
      ```
      This is the `MaybeReadBlockAndLoadToCache` function where `TBlocklike = ParsedFullFilterBlock`. The trigger was an FDO profile update D48261413.
      
      `MaybeReadBlockAndLoadToCache` is used in the same translation unit `block_based_table_reader.cc`, and also in another file `partitioned_filter_block.cc`. The later was the file that couldn't find the symbol. It seems after the FDO profile update, `MaybeReadBlockAndLoadToCache` may've got inlined into its caller in `block_based_table_reader.cc`. And with no knowledge of other usages, the symbol got stripped.
      
      Explicitly instantiate the template similar to how `RetrieveBlock` was handled.
      
      Reviewed By: pdillinger, akankshamahajan15
      
      Differential Revision: D48400574
      
      fbshipit-source-id: d4a80999bfb6ce4afa80678444139fcd8ae84aa4
      a67ef998
  4. 18 8月, 2023 2 次提交
    • Y
      Add a per column family default temperature option for accounting (#11708) · 1e77e35d
      Yu Zhang 提交于
      Summary:
      Add a column family option `default_temperature` that will be used for file reading accounting purpose, such as io statistics, for files that don't have an explicitly set temperature.
      
      This options is not a mutable one, changing its value would require a DB restart. This is to avoid the confusion that had the option being a mutable one, the users may expect it to take effect on all files immediately, while in reality, it would only become effective for SST files opened in the future.
      
      This `default_temperature` also just affect accounting during one DB session. It won't be recorded in manifest as the file's temperature and can be different across different DB sessions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11708
      
      Test Plan:
      ```
      make all check
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D48375763
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: eb756696c14a694c6e2a93d2bb6f040563194981
      1e77e35d
    • P
      Clean up some FastRange calls (#11707) · 966be1cc
      Peter Dillinger 提交于
      Summary:
      * JemallocNodumpAllocator was passing a size_t to FastRange32, which could cause compilation errors or warnings (seen with clang)
      * Fixed the order of arguments to match what would be used with modulo operator (%), for clarity.
      
      Fixes https://github.com/facebook/rocksdb/issues/11006
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11707
      
      Test Plan: no functional change, existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D48435149
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e6e8b107ded4eceda37db20df59985c846a2546b
      966be1cc
  5. 17 8月, 2023 2 次提交
    • C
      Delay bottommost level single file compactions (#11701) · d1ff4014
      Changyu Bi 提交于
      Summary:
      For leveled compaction, RocksDB has a special kind of compaction with reason "kBottommmostFiles" that compacts bottommost level files to clear data held by snapshots (more detail in https://github.com/facebook/rocksdb/issues/3009). Such compactions can happen soon after a relevant snapshot is released. For some use cases, a bottommost file may contain only a small amount of keys that can be cleared, so compacting such a file has a high write amp. In addition, these bottommost files may be compacted in compactions with reason other than "kBottommmostFiles" if we wait for some time (so that enough data is ingested to trigger such a compaction). This PR introduces an option `bottommost_file_compaction_delay` to specify the delay of these bottommost level single file compactions.
      
      * The main change is in `VersionStorageInfo::ComputeBottommostFilesMarkedForCompaction()` where we only add a file to `bottommost_files_marked_for_compaction_` if it oldest_snapshot is larger than its non-zero largest_seqno **and** the file is old enough. Note that if a file is not old enough but its largest_seqno is less than oldest_snapshot, we exclude it from the calculation of `bottommost_files_mark_threshold_`. This makes the change simpler, but such a file's eligibility for compaction will only be checked the next time `ComputeBottommostFilesMarkedForCompaction()` is called. This happens when a new Version is created (compaction, flush, SetOptions()...), a new enough snapshot is released (`VersionStorageInfo::UpdateOldestSnapshot()`) or when a compaction is picked and compaction score has to be re-calculated.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11701
      
      Test Plan:
      * Add two unit tests to test when bottommost_file_compaction_delay > 0.
      * Ran crash test with the new option.
      
      Reviewed By: jaykorean, ajkr
      
      Differential Revision: D48331564
      
      Pulled By: cbi42
      
      fbshipit-source-id: c584f3dc5f6354fce3ed65f4c6366dc450b15ba8
      d1ff4014
    • A
      clarify TODO for whitebox disable_wal=1 in db_crashtest.py (#11665) · 0b6ee88d
      Andrew Kryczka 提交于
      Summary:
      See https://github.com/facebook/rocksdb/issues/11613
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11665
      
      Reviewed By: hx235
      
      Differential Revision: D48010507
      
      Pulled By: ajkr
      
      fbshipit-source-id: 65c6d87d2c6ffc9d25f1d17106eae467ec528082
      0b6ee88d
  6. 16 8月, 2023 2 次提交
    • J
      Wide Column Ingestion in CrashTest (#11697) · b63018fb
      Jay Huh 提交于
      Summary:
      `PutEntity` is now supported in SST file writer (https://github.com/facebook/rocksdb/issues/11688). This PR enables ingestion of wide column data in the stress/crash tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11697
      
      Test Plan:
      ```
      python3 tools/db_crashtest.py blackbox --simple --duration=300 --ingest_external_file_one_in=2 --use_put_entity_one_in=2 --max_key=1048576 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0 --key_len_percent_dist="1,30,69"
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D48370719
      
      Pulled By: jaykorean
      
      fbshipit-source-id: 5855d3112b37b2fb300d05e6df110d899855d77d
      b63018fb
    • Y
      Expose the root comparator for built-in With64Ts comparators (#11704) · 407efb02
      Yu Zhang 提交于
      Summary:
      As titled. User-defined timestamp feature users sometimes directly call the user comparator to do validation on their side too. Having access to the root comparator can help make their code consistent for when UDT is enabled and disabled.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11704
      
      Reviewed By: ltamasi
      
      Differential Revision: D48355090
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 26bc73543bfb379ef548d1361803d6f8c308cef6
      407efb02
  7. 15 8月, 2023 2 次提交
    • Y
      Add documentation to some formatting util functions (#11674) · 6a3da563
      Yu Zhang 提交于
      Summary:
      As titled, mostly adding documentation. While updating one usage of these util functions in the external file ingestion job based on code inspection.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11674
      
      Test Plan:
      ```
      make check
      ```
      
      Note that no unit test was added or updated to check the change in the external file ingestion flow works. This is because user-defined timestamp doesn't support bulk loading yet. There could be other missing pieces that are needed to make this flow functional and testable. That work is separately tracked and unit tests will be added then.
      
      Reviewed By: cbi42
      
      Differential Revision: D48271338
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: c05c3440f1c08632dd0de51b563a30b44b4eb8b5
      6a3da563
    • A
      In TestIterateAgainstExpected(), verify iterator moves in expected direction (#11698) · a09c141d
      Andrew Kryczka 提交于
      Summary:
      It's a bit repetitive in order to give reasonably informative error messages.
      
      I also removed total_order_seek in cases where it's not needed, just to make sure a case that shouldn't matter really doesn't.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11698
      
      Test Plan:
      run it -
      
      ```
      $ DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=100000 --duration=86400 --interval=10 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152 --compression_type=none --blob_compression_type=none --writepercent=50 -iterpercent=45 -readpercent=0 -prefixpercent=0 --prefix_size=0 --verify_iterator_with_expected_state_one_in=10 --test_batches_snapshots=0 -enable_compaction_filter=0
      ```
      
      Reviewed By: cbi42
      
      Differential Revision: D48285036
      
      Pulled By: ajkr
      
      fbshipit-source-id: 51b147bd7c8011740629ae2fd8114d3d48ce7137
      a09c141d
  8. 12 8月, 2023 6 次提交
    • J
      Fix for unchecked status in CancelAllBackgroundWork (#11699) · 793a786f
      Jay Huh 提交于
      Summary:
      ## Summary
      PR https://github.com/facebook/rocksdb/issues/11497 introduced this. Status from `CancelPeriodicTaskScheduler()` is unchecked and causing test failure like https://app.circleci.com/pipelines/github/facebook/rocksdb/30743/workflows/24443a9b-6fc3-41e6-86c1-992d766eb1ec/jobs/642419
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11699
      
      Test Plan: Existing tests
      
      Reviewed By: cbi42
      
      Differential Revision: D48287188
      
      Pulled By: jaykorean
      
      fbshipit-source-id: b6bcf6e3c3c47f126c34c24a3dfed2649635cc8c
      793a786f
    • P
      Placeholder for AutoHyperClockCache, more (#11692) · ef6f0255
      Peter Dillinger 提交于
      Summary:
      * The plan is for AutoHyperClockCache to be selected when HyperClockCacheOptions::estimated_entry_charge == 0, and in that case to use a new configuration option min_avg_entry_charge for determining an extreme case maximum size for the hash table. For the placeholder, a hack is in place in HyperClockCacheOptions::MakeSharedCache() to make the unit tests happy despite the new options not really making sense with the current implementation.
      * Mostly updating and refactoring tests to test both the current HCC (internal name FixedHyperClockCache) and a placeholder for the new version (internal name AutoHyperClockCache).
      * Simplify some existing tests not to depend directly on cache type.
      * Type-parameterize the shard-level unit tests, which unfortunately requires more syntax like `this->` in places for disambiguation.
      * Added means of choosing auto_hyper_clock_cache to cache_bench, db_bench, and db_stress, including add to crash test.
      * Add another templated class BaseHyperClockCache to reduce future copy-paste
      * Added ReportProblems support to cache_bench
      * Added a DEBUG-level diagnostic to ReportProblems for the variance in load factor throughout the table, which will become more of a concern with linear hashing to be used in the Auto implementation. Example with current Fixed HCC:
      ```
      2023/08/10-13:41:41.602450 6ac36 [DEBUG] [che/clock_cache.cc:1507] Slot occupancy stats: Overall 49% (129008/262144), Min/Max/Window = 39%/60%/500, MaxRun{Pos/Neg} = 18/17
      ```
      
      In other words, with overall occupancy of 49%, the lowest across any 500 contiguous cells is 39% and highest 60%. Longest run of occupied is 18 and longest run of unoccupied is 17. This seems consistent with random samples from a uniform distribution.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11692
      
      Test Plan: Shouldn't be any meaningful changes yet to production code or to what is tested, but there is temporary redundancy in testing until the new implementation is plugged in.
      
      Reviewed By: jowlyzhang
      
      Differential Revision: D48247413
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 11541f996d97af403c2e43c92fb67ff22dd0b5da
      ef6f0255
    • H
      Remove comment about locking about TestIterateAgainstExpected (#11695) · 38ecfabe
      Hui Xiao 提交于
      Summary:
      **Context/Summary**
      After https://github.com/facebook/rocksdb/pull/11058, we no longer lock the key range to iterate in TestIterateAgainstExpected, except for working with timestamp feature.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11695
      
      Test Plan: no code change
      
      Reviewed By: ajkr
      
      Differential Revision: D48276668
      
      Pulled By: hx235
      
      fbshipit-source-id: dc92a3708b2281dc737c0877fb755548bf03a9fc
      38ecfabe
    • J
      Close DB option in WaitForCompact() (#11497) · 52816ff6
      Jay Huh 提交于
      Summary:
      Context:
      
      As mentioned in https://github.com/facebook/rocksdb/issues/11436, introducing `close_db` option in `WaitForCompactOptions` to close DB after waiting for compactions to finish. Must be set to true to close the DB upon compactions finishing.
      1. `bool close_db = false` added to `WaitForCompactOptions`
      2. Introduced `CancelPeriodicTaskSchedulers()` and moved unregistering PeriodicTaskSchedulers to it.`CancelAllBackgroundWork()` calls it now.
      3. When close_db option is on, unpersisted data (data in memtable when WAL is disabled) will be flushed in `WaitForCompact()` if flush option is not on (and `mutable_db_options_.avoid_flush_during_shutdown` is not true). The unpersisted data flush in `CancelAllBackgroundWork()` will be skipped because `shutting_down_` flag will be set true before calling `Close()`.
      4. Atomic boolean `reject_new_background_jobs_` is introduced to prevent new background jobs from being added during the short period of time after waiting is done and before `shutting_down_` is set by `Close()`.
      5. `WaitForCompact()` now waits for recovery in progress to complete as well. (flush operations from WAL -> L0 files)
      6. Added `close_db_` cases to all existing `WaitForCompactTests`
      7. Added a scenario to `DBBasicTest::DBClose`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11497
      
      Test Plan:
      - Existing DBCompactionTests
      - `WaitForCompactWithOptionToFlushAndCloseDB` added
      - Added a scenario to `DBBasicTest::DBClose`
      
      Reviewed By: pdillinger, jowlyzhang
      
      Differential Revision: D46337560
      
      Pulled By: jaykorean
      
      fbshipit-source-id: 0f8c7ee09394847f2af5ea4bdd331b47bcdef0b0
      52816ff6
    • Y
      Add UDT support in API DB::GetApproximateMemTableStats (#11689) · 7cdbce45
      Yu Zhang 提交于
      Summary:
      This API should consider the case when user-defined timestamp is enabled. Also added some documentation to some related API to clarify the usage in the case when user-defined timestamp is enabled.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11689
      
      Test Plan:
      Unit test added
      ```
      make check
      ./db_with_timestamp_basic_test --gtest_filter=*GetApproximateSizes*
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D48208568
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: c5baa4a2923441f8ea3a3672c98223a43a3428dc
      7cdbce45
    • N
      fix CXX not initialized early enough in Makefile on openbsd + platform version... · 17b33c8b
      nikoPLP 提交于
      fix CXX not initialized early enough in Makefile on openbsd + platform version 10.14 on macos (#11675)
      
      Summary:
      fixes https://github.com/facebook/rocksdb/issues/11220
      fixes https://github.com/facebook/rocksdb/issues/11594
      
      CXX is not initialized early enough in Makefile.
      On OpenBSD its value is `g++` at first, and this results in several `command not found`, notably during the tests for HAVE_POWER8 and HAS_ALTIVEC which results in the build problem mentionned in https://github.com/facebook/rocksdb/issues/11594
      
      reordering the Makefile fixes the issue, by placing the creation of make_config.mk and its import before any use of `$(CXX)`
      
      Also, fixes the platofrm version for macos. it must be 10.14 now that rocksdb is using the C++17 standard
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11675
      
      Reviewed By: cbi42
      
      Differential Revision: D48101615
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1f1b4d4604480b31675140b92c6fe97dc55b8c75
      17b33c8b
  9. 11 8月, 2023 3 次提交
    • J
      PutEntity Support in SST File Writer (#11688) · 66643b81
      Jay Huh 提交于
      Summary:
      RocksDB provides APIs that enable creating SST files offline and then bulk loading them into the LSM tree quickly using metadata operations. Namely, clients can use the `SstFileWriter` class for the offline data preparation and then the IngestExternalFile family of APIs to perform the bulk loading. However, `SstFileWriter` currently does not support creating files with wide-column data in them. This PR adds `PutEntity` API implementation to `SstFileWriter` to support creating files with wide-column data.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11688
      
      Test Plan: - `BasicWideColumn` test added in external_sst_file_test
      
      Reviewed By: ltamasi
      
      Differential Revision: D48243779
      
      Pulled By: jaykorean
      
      fbshipit-source-id: 1697e5bd67121a648c03946f867416a94be0cadf
      66643b81
    • Y
      Add consistent ways to access the builtin UDT comparators (#11690) · 36f48d16
      Yu Zhang 提交于
      Summary:
      Expose the functions that creates these UDT aware comparators so that users can create all the RocksDB builtin comparators in consistent ways.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11690
      
      Reviewed By: ltamasi
      
      Differential Revision: D48212021
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: a17a9a11e36e4267551e193f1b22647414acf467
      36f48d16
    • P
      Adjust db_stress handling of TryAgain from optimistic txn (#11691) · a85eccc6
      Peter Dillinger 提交于
      Summary:
      We're still getting some rare cases of 5x TryAgains in a row. Here I'm boosting the failure threshold to 10 in a row and adding more info in the output, to help us manually verify whether there's anything suspicous about the sequence of TryAgains, such as if Rollback failed to reset to new sequence numbers.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11691
      
      Test Plan: By lowering the threshold to 2 and adjusting some other db_crashtest parameters, I was able to hit my new code and saw fresh sequence number on the subsequent TryAgain.
      
      Reviewed By: cbi42
      
      Differential Revision: D48236153
      
      Pulled By: pdillinger
      
      fbshipit-source-id: c0530e969ddcf8de7348e5cf7daf5d6d5dec24f4
      a85eccc6
  10. 10 8月, 2023 2 次提交
  11. 09 8月, 2023 3 次提交
    • H
      Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) · 9a034801
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      - Similar to https://github.com/facebook/rocksdb/pull/11288 but for user read such as `Get(), MultiGet(), DBIterator::XXX(), Verify(File)Checksum()`.
         - For this, I refactored some user-facing `MultiGet` calls in `TransactionBase` and various types of `DB` so that it does not call a user-facing `Get()` but `GetImpl()` for passing the `ReadOptions::io_activity` check (see PR conversation)
         - New user read stats breakdown are guarded by `kExceptDetailedTimers` since measurement shows they have 4-5% regression to the upstream/main.
      
      - Misc
         - More refactoring: with https://github.com/facebook/rocksdb/pull/11288, we complete passing `ReadOptions/IOOptions` to FS level. So we can now replace the previously [added](https://github.com/facebook/rocksdb/pull/9424) `rate_limiter_priority` parameter in `RandomAccessFileReader`'s `Read/MultiRead/Prefetch()` with `IOOptions::rate_limiter_priority`
         - Also, `ReadAsync()` call time is measured in `SST_READ_MICRO` now
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11444
      
      Test Plan:
      - CI fake db crash/stress test
      - Microbenchmarking
      
      **Build** `make clean && ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make -jN db_basic_bench`
      - google benchmark version: https://github.com/google/benchmark/commit/604f6fd3f4b34a84ec4eb4db81d842fa4db829cd
      - db_basic_bench_base: upstream
      - db_basic_bench_pr: db_basic_bench_base + this PR
      - asyncread_db_basic_bench_base: upstream + [db basic bench patch for IteratorNext](https://github.com/facebook/rocksdb/compare/main...hx235:rocksdb:micro_bench_async_read)
      - asyncread_db_basic_bench_pr: asyncread_db_basic_bench_base + this PR
      
      **Test**
      
      Get
      ```
      TEST_TMPDIR=/dev/shm ./db_basic_bench_{null_stat|base|pr} --benchmark_filter=DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1/negative_query:0/enable_filter:0/mmap:1/threads:1 --benchmark_repetitions=1000
      ```
      
      Result
      ```
      Coming soon
      ```
      
      AsyncRead
      ```
      TEST_TMPDIR=/dev/shm ./asyncread_db_basic_bench_{base|pr} --benchmark_filter=IteratorNext/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1/async_io:1/include_detailed_timers:0 --benchmark_repetitions=1000 > syncread_db_basic_bench_{base|pr}.out
      ```
      
      Result
      ```
      Base:
      1956,1956,1968,1977,1979,1986,1988,1988,1988,1990,1991,1991,1993,1993,1993,1993,1994,1996,1997,1997,1997,1998,1999,2001,2001,2002,2004,2007,2007,2008,
      
      PR (2.3% regression, due to measuring `SST_READ_MICRO` that wasn't measured before):
      1993,2014,2016,2022,2024,2027,2027,2028,2028,2030,2031,2031,2032,2032,2038,2039,2042,2044,2044,2047,2047,2047,2048,2049,2050,2052,2052,2052,2053,2053,
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D45918925
      
      Pulled By: hx235
      
      fbshipit-source-id: 58a54560d9ebeb3a59b6d807639692614dad058a
      9a034801
    • Y
      Log user_defined_timestamps_persisted flag in event logger (#11683) · 9c2ebcc2
      Yu Zhang 提交于
      Summary:
      As titled, and also removed an undefined and unused member function in for ColumnFamilyData
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11683
      
      Reviewed By: ajkr
      
      Differential Revision: D48156290
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: cc99aaafe69db6611af3854cb2b2ebc5044941f7
      9c2ebcc2
    • P
      Fix a potential memory leak on row_cache insertion failure (#11682) · e214964f
      Peter Dillinger 提交于
      Summary:
      Although the built-in Cache implementations never return failure on Insert without keeping a reference (Handle), a custom implementation could. The code for inserting into row_cache does not keep a reference but does not clean up appropriately on non-OK. This is a fix.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11682
      
      Test Plan: unit test added that previously fails under ASAN
      
      Reviewed By: ajkr
      
      Differential Revision: D48153831
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 86eb7387915c5b38b6ff5dd8deb4e1e223b7d020
      e214964f
  12. 08 8月, 2023 4 次提交
    • P
      Prepare tests for new HCC naming (#11676) · 99daea34
      Peter Dillinger 提交于
      Summary:
      I'm anticipating using the public name HyperClockCache for both the current version with a fixed-size table and the upcoming version with an automatically growing table. However, for simplicity of testing them as substantially distinct implementations, I want to give them distinct internal names, like FixedHyperClockCache and AutoHyperClockCache.
      
      This change anticipates that by renaming to FixedHyperClockCache and assuming for now that all the unit tests run on HCC will run and behave similarly for the automatic HCC. Obviously updates will need to be made, but I'm trying to avoid uninteresting find & replace updates in what will be a large and engineering-heavy PR for AutoHCC
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11676
      
      Test Plan: no behavior change intended, except logging will now use the name FixedHyperClockCache
      
      Reviewed By: ajkr
      
      Differential Revision: D48103165
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a33f1901488fea102164c2318e2f2b156aaba736
      99daea34
    • T
      exclude uninitialized files when estimating compression ratio (#11664) · 6d1effaf
      tabokie 提交于
      Summary:
      Exclude files with uninitialized table properties when estimating compression ratio.
      
      Cherry-picking downstream PR: https://github.com/tikv/rocksdb/pull/335
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11664
      
      Reviewed By: cbi42
      
      Differential Revision: D48002518
      
      Pulled By: ajkr
      
      fbshipit-source-id: 931fac8a06b4ed7b7b605cf79903302f1b8babfd
      6d1effaf
    • X
      compute compaction score once for a batch of range file deletes (#10744) · d2b0652b
      Xinye Tao 提交于
      Summary:
      Only re-calculate compaction score once for a batch of deletions. Fix performance regression brought by https://github.com/facebook/rocksdb/pull/8434.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10744
      
      Test Plan:
      In one of our production cluster that recently upgraded to RocksDB 6.29, it takes more than 10 minutes to delete files in 30,000 ranges. The RocksDB instance contains approximately 80,000 files. After this patch, the duration reduces to 100+ ms, which is on par with RocksDB 6.4.
      
      Cherry-picking downstream PR: https://github.com/tikv/rocksdb/pull/316Signed-off-by: Ntabokie <xy.tao@outlook.com>
      
      Reviewed By: cbi42
      
      Differential Revision: D48002581
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7245607ee3ad79c53b648a6396c9159f166b9437
      d2b0652b
    • P
      More minor HCC refactoring + typed mmap (#11670) · cdb11f5c
      Peter Dillinger 提交于
      Summary:
      More code leading up to dynamic HCC.
      * Small enhancements to cache_bench
      * Extra assertion in Unref
      * Improve a CAS loop in ChargeUsageMaybeEvictStrict
      * Put load factor constants in appropriate class
      * Move `standalone` field to HyperClockTable::HandleImpl because it can be encoded differently in the upcoming dynamic HCC.
      * Add a typed version of MemMapping to simplify some future code.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11670
      
      Test Plan: existing tests, unit test added for TypedMemMapping
      
      Reviewed By: jowlyzhang
      
      Differential Revision: D48056464
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 186b7d3105c5d6d2eb6a592369bc10a97ee14a15
      cdb11f5c
  13. 07 8月, 2023 1 次提交
  14. 05 8月, 2023 1 次提交
    • C
      Avoid shifting component too large error in FileTtlBooster (#11673) · eca48bc1
      Changyu Bi 提交于
      Summary:
      When `num_levels` > 65, we may be shifting more than 63 bits in FileTtlBooster. This can give errors like: `runtime error: shift exponent 98 is too large for 64-bit type 'uint64_t' (aka 'unsigned long')`. This PR makes a quick fix for this issue by taking a min in the shifting component. This issue should be rare since it requires a user using a large `num_levels`. I'll follow up with a more complex fix if needed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11673
      
      Test Plan: * Add a unit test that produce the above error before this PR. Need to compile it with ubsan: `COMPILE_WITH_UBSAN=1 OPT="-fsanitize-blacklist=.circleci/ubsan_suppression_list.txt" ROCKSDB_DISABLE_ALIGNED_NEW=1 USE_CLANG=1 make V=1 -j32 compaction_picker_test`
      
      Reviewed By: hx235
      
      Differential Revision: D48074386
      
      Pulled By: cbi42
      
      fbshipit-source-id: 25e59df7e93f20e0793cffb941de70ac815d9392
      eca48bc1