1. 07 12月, 2022 2 次提交
  2. 06 12月, 2022 1 次提交
  3. 05 12月, 2022 1 次提交
    • A
      Fix table cache leak in MultiGet with async_io (#10997) · 8ffabdc2
      anand76 提交于
      Summary:
      When MultiGet with the async_io option encounters an IO error in TableCache::GetTableReader, it may result in leakage of table cache handles due to queued coroutines being abandoned. This PR fixes it by ensuring any queued coroutines are run before aborting the MultiGet.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10997
      
      Test Plan:
      1. New unit test in db_basic_test
      2. asan_crash
      
      Reviewed By: pdillinger
      
      Differential Revision: D41587244
      
      Pulled By: anand1976
      
      fbshipit-source-id: 900920cd3fba47cb0fc744a62facc5ffe2eccb64
      8ffabdc2
  4. 02 12月, 2022 1 次提交
  5. 01 12月, 2022 2 次提交
  6. 30 11月, 2022 2 次提交
    • H
      Fix missing WAL in new manifest by rolling over the WAL deletion record from prev manifest (#10892) · 2f76ab15
      Hui Xiao 提交于
      Summary:
      **Context**
      `Options::track_and_verify_wals_in_manifest = true` verifies each of the WALs tracked in manifest indeed presents in the WAL folder. If not, a corruption "Missing WAL with log number" will be thrown.
      
      `DB::SyncWAL()` called at a specific timing (i.e, at the `TEST_SYNC_POINT("FindObsoleteFiles::PostMutexUnlock")`) can record in a new manifest the WAL addition of a WAL file that already had a WAL deletion recorded in the previous manifest.
      And the WAL deletion record is not rollover-ed to the new manifest. So the new manifest creates the illusion of such WAL never gets deleted and should presents at db re/open.
      - Such WAL deletion record can be caused by flushing the memtable associated with that WAL and such WAL deletion can actually happen in` PurgeObsoleteFiles()`.
      
      As a consequence, upon `DB::Reopen()`, this WAL file can be deleted while manifest still has its WAL addition record , which causes a false alarm of corruption "Missing WAL with log number" to be thrown.
      
      **Summary**
      This PR fixes this false alarm by rolling over the WAL deletion record from prev manifest to the new manifest by adding the WAL deletion record to the new manifest.
      
      **Test**
      - Make check
      - Added new unit test `TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL)` that failed before the fix and passed after
      - [Ongoing]CI stress test + aggressive value as in https://github.com/facebook/rocksdb/pull/10761 , which is how this false alarm was first surfaced, to confirm such false alarm disappears
      - [Ongoing]Regular CI stress test to confirm such fix didn't harm anything
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10892
      
      Reviewed By: ajkr
      
      Differential Revision: D40778965
      
      Pulled By: hx235
      
      fbshipit-source-id: a512364bfdeb0b1a55c171890e60d856c528f37f
      2f76ab15
    • H
      Revert PR 10777 "Fix FIFO causing overlapping seqnos in L0 files due to overla…" (#10999) · f1574a20
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      
      This reverts commit fc74abb4 and related HISTORY record.
      
      The issue with PR 10777 or general approach using earliest_mem_seqno like https://github.com/facebook/rocksdb/pull/5958#issue-511150930 is that the earliest seqno of memtable of each CFs does not get persisted and will always start with 0 upon Recover(). Later when creating a new memtable in certain CF, we use the last seqno of the whole DB (but not of that CF from previous DB session) for this CF.  This will lead to false positive overlapping seqno and PR 10777 will throw something like https://github.com/facebook/rocksdb/blob/main/db/compaction/compaction_picker.cc#L1002-L1004
      
      Luckily a more elegant and complete solution to the overlapping seqno problem these PR aim to solve does not have above problem, see https://github.com/facebook/rocksdb/pull/10922. It is already being pursued and in the process of review. So we can just revert this PR and focus on getting PR10922 to land.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10999
      
      Test Plan: make check
      
      Reviewed By: anand1976
      
      Differential Revision: D41572604
      
      Pulled By: hx235
      
      fbshipit-source-id: 9d9bdf594abd235e2137045cef513ca0b14e0a3a
      f1574a20
  7. 29 11月, 2022 3 次提交
    • C
      Remove copying of range tombstones keys in iterator (#10878) · 6cdb7af9
      Changyu Bi 提交于
      Summary:
      In MergingIterator, if a range tombstone's start or end key is added to minHeap/maxHeap, the key is copied. This PR removes the copying of range tombstone keys by adding InternalKey comparator that compares `Slice` for internal key and `ParsedInternalKey` directly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10878
      
      Test Plan:
      - existing UT
      - ran all flavors of stress test through sandcastle
      - benchmarks: I did not get improvement when compiling with DEBUG_LEVEL=0, and saw many noise. With `OPTIMIZE_LEVEL="-O3" USE_LTO=1` I do see improvement.
      ```
      # Favorable set up: half of the writes are DeleteRange.
      TEST_TMPDIR=/tmp/rocksdb-rangedel-test-all-tombstone ./db_bench --benchmarks=fillseq,levelstats --writes_per_range_tombstone=1 --max_num_range_tombstones=1000000 --range_tombstone_width=2 --num=1000000 --max_bytes_for_level_base=4194304 --disable_auto_compactions --write_buffer_size=33554432 --key_size=50
      
      # benchmark command
      TEST_TMPDIR=/tmp/rocksdb-rangedel-test-all-tombstone ./db_bench --benchmarks=readseq[-W1][-X5],levelstats --use_existing_db=true --cache_size=3221225472  --disable_auto_compactions=true --avoid_flush_during_recovery=true --seek_nexts=100 --reads=1000000 --num=1000000 --threads=25
      
      # main
      readseq [AVG    5 runs] : 26017977 (± 371077) ops/sec; 3721.9 (± 53.1) MB/sec
      readseq [MEDIAN 5 runs] : 26096905 ops/sec; 3733.2 MB/sec
      
      # this PR
      readseq [AVG    5 runs] : 27481724 (± 568758) ops/sec; 3931.3 (± 81.4) MB/sec
      readseq [MEDIAN 5 runs] : 27323957 ops/sec; 3908.7 MB/sec
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D40711170
      
      Pulled By: cbi42
      
      fbshipit-source-id: 708cb584e2bd085a9ce0d2ef6a420489f721717f
      6cdb7af9
    • H
      Trigger FIFO file deletion in non L0 only if exceeding max_table_files_size (#10955) · d8c043f7
      Hui Xiao 提交于
      Summary:
      **Context**
      
      https://github.com/facebook/rocksdb/pull/10348 allows multi-level FIFO but accidentally made change to the logic of deleting files in `FIFOCompactionPicker::PickSizeCompaction`. With [this](https://github.com/facebook/rocksdb/pull/10348/files#diff-d8fb3d50749aa69b378de447e3d9cf2f48abe0281437f010b5d61365a7b813fdR156) and [this](https://github.com/facebook/rocksdb/pull/10348/files#diff-d8fb3d50749aa69b378de447e3d9cf2f48abe0281437f010b5d61365a7b813fdR235) together, it deletes one file in non-L0 even when `total_size <= mutable_cf_options.compaction_options_fifo.max_table_files_size`, which is incorrect.
      
      As a consequence, FIFO exercises more file deletion in our crash testing, which is not able to verify correctly on deleted keys in the file deleted by compaction. This results in errors  `error : inconsistent values for key 000000000000239F000000000000012B000000000000028B: expected state has the key, Get() returns NotFound.
      Verification failed :(` or `Expected state has key 00000000000023A90000000000000003787878, iterator is at key 00000000000023A9000000000000004178
      Column family: default, op_logs: S 00000000000023A90000000000000003787878`
      
      **Summary**:
      - Delete file for non-L0 only if `total_size <= mutable_cf_options.compaction_options_fifo.max_table_files_size`
      - Add some helpful log to LOG file
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10955
      
      Test Plan:
      - Errors repro-ed by
      ```
      ./db_stress --preserve_unverified_changes=1 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=10 --bottommost_compression_type=none --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_style=2 --compaction_ttl=0 --compression_max_dict_buffer_bytes=8589934591 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=xpress --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=1048576 --delpercent=0 --delrangepercent=0 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=10 --index_type=2 --ingest_external_file_one_in=1000000 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --log2_keys_per_lock=10 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=0 --num_file_reads_for_auto_readahead=2 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=40000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=7 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=0 --readpercent=65 --recycle_log_file_num=1 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=2 --sync=0 --sync_fault_injection=0 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=1 --use_full_merge_v1=1 --use_merge=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_iterator_with_expected_state_one_in=0 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=20
      ```
      is gone after this fix
      - CI
      
      Reviewed By: ajkr
      
      Differential Revision: D41319441
      
      Pulled By: hx235
      
      fbshipit-source-id: 6939753767007f7449ea7055b1420aabd03d7709
      d8c043f7
    • R
      Add Apache Spark as a user (#10993) · ed23fd75
      relife22 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10993
      
      Reviewed By: ajkr
      
      Differential Revision: D41543962
      
      Pulled By: cbi42
      
      fbshipit-source-id: a895d7863543bd64734c5c9faa7b55b0732b3d60
      ed23fd75
  8. 24 11月, 2022 2 次提交
    • C
      Prevent iterating over range tombstones beyond `iterate_upper_bound` (#10966) · 534fb06d
      Changyu Bi 提交于
      Summary:
      Currently, `iterate_upper_bound` is not checked for range tombstone keys in MergingIterator. This may impact performance when there is a large number of range tombstones right after `iterate_upper_bound`. This PR fixes this issue by checking `iterate_upper_bound` in MergingIterator for range tombstone keys.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10966
      
      Test Plan:
      - added unit test
      - stress test: `python3 tools/db_crashtest.py whitebox --simple --verify_iterator_with_expected_state_one_in=5 --delrangepercent=5 --prefixpercent=18 --writepercent=48 --readpercen=15 --duration=36000 --range_deletion_width=100`
      - ran different stress tests over sandcastle
      - Falcon team ran some test traffic and saw reduced CPU usage on processing range tombstones.
      
      Reviewed By: ajkr
      
      Differential Revision: D41414172
      
      Pulled By: cbi42
      
      fbshipit-source-id: 9b2c29eb3abb99327c6a649bdc412e70d863f981
      534fb06d
    • A
      Support tiering when file endpoints overlap (#10961) · 54c2542d
      Andrew Kryczka 提交于
      Summary:
      Enabled output to penultimate level when file endpoints overlap. This is probably only possible when range tombstones span files. Otherwise the overlapping files would all be included in the penultimate level inputs thanks to our atomic compaction unit logic.
      
      Also, corrected `penultimate_output_range_type_`, which is a minor fix as it appears only used for logging.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10961
      
      Test Plan: updated unit test
      
      Reviewed By: cbi42
      
      Differential Revision: D41370615
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7e75ec369a3b41b8382b336446c81825a4c4f572
      54c2542d
  9. 23 11月, 2022 4 次提交
    • Y
      Make best-efforts recovery verify SST unique ID before Version construction (#10962) · 3d0d6b81
      Yanqin Jin 提交于
      Summary:
      The check for SST unique IDs added to best-efforts recovery (`Options::best_efforts_recovery` is true).
      
      With best_efforts_recovery being true, RocksDB will recover to the latest point in
      MANIFEST such that all valid SST files included up to this point pass unique ID checks as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10962
      
      Test Plan: make check
      
      Reviewed By: pdillinger
      
      Differential Revision: D41378241
      
      Pulled By: riversand963
      
      fbshipit-source-id: a036064e2c17dec13d080a24ef2a9f85d607b16c
      3d0d6b81
    • J
      fix compile warnings (#10976) · d8e792e4
      jsteemann 提交于
      Summary:
      Fixes lots of compile warnings related to missing override specifiers, e.g.
      ```
      ./3rdParty/rocksdb/trace_replay/block_cache_tracer.h:130:10: warning: ‘virtual rocksdb::Status rocksdb::BlockCacheTraceWriterImpl::WriteBlockAccess(const rocksdb::BlockCacheTraceRecord&, const rocksdb::Slice&, const rocksdb::Slice&, const rocksdb::Slice&)’ can be marked override [-Wsuggest-override]
        130 |   Status WriteBlockAccess(const BlockCacheTraceRecord& record,
            |          ^~~~~~~~~~~~~~~~
      ./3rdParty/rocksdb/trace_replay/block_cache_tracer.h:136:10: warning: ‘virtual rocksdb::Status rocksdb::BlockCacheTraceWriterImpl::WriteHeader()’ can be marked override [-Wsuggest-override]
        136 |   Status WriteHeader();
            |          ^~~~~~~~~~~
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10976
      
      Reviewed By: riversand963
      
      Differential Revision: D41478588
      
      Pulled By: ajkr
      
      fbshipit-source-id: d30b0457241999e38b16aacf6dabe3e691f7c46f
      d8e792e4
    • A
      improve copying of Env in Options (#10666) · ae115eff
      Alan Paxton 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/9909
      
      - Constructing an Options from a DBOptions should use the Env from the DBOptions
      - DBOptions should be constructed with the default Env as the env_, rather than null. Why ever not ?
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10666
      
      Reviewed By: riversand963
      
      Differential Revision: D40515418
      
      Pulled By: ajkr
      
      fbshipit-source-id: 4122ba3f537660720262694c21ab4bfb13b6f8de
      ae115eff
    • A
      Deflake DBTest2.TraceAndReplay by relaxing latency checks (#10979) · db9cbddc
      Andrew Kryczka 提交于
      Summary:
      Since the latency measurement uses real time it is possible for the operation to complete in zero microseconds and then fail these checks. We saw this with the operation that invokes Get() on an invalid CF. This PR relaxes the assertions to allow for operations completing in zero microseconds.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10979
      
      Reviewed By: riversand963
      
      Differential Revision: D41478300
      
      Pulled By: ajkr
      
      fbshipit-source-id: 50ef096bd8f0162b31adb46f54ae6ddc337d0a5e
      db9cbddc
  10. 22 11月, 2022 6 次提交
    • A
      Post 7.9.0 release branch cut updates (#10974) · f4cfcfe8
      anand76 提交于
      Summary:
      Update HISTORY.md, version.h, and check_format_compatible.sh
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10974
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D41455289
      
      Pulled By: anand1976
      
      fbshipit-source-id: 99888ebcb9109e5ced80584a66b20123f8783c0b
      f4cfcfe8
    • C
      Set correct temperature for range tombstone only file in penultimate level (#10972) · 6c5ec920
      Changyu Bi 提交于
      Summary:
      before this PR, if there is a range tombstone-only file generated in penultimate level, it is marked the `last_level_temperature`. This PR fixes this issue.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10972
      
      Test Plan: added unit test for this scenario.
      
      Reviewed By: ajkr
      
      Differential Revision: D41449215
      
      Pulled By: cbi42
      
      fbshipit-source-id: 1e06b5ae3bc0183db2991a45965a9807a7e8be0c
      6c5ec920
    • A
      Update HISTORY.md for 7.9.0 (#10973) · 3ff6da6b
      anand76 提交于
      Summary:
      Update HISTORY.md for 7.9.0 release.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10973
      
      Reviewed By: pdillinger
      
      Differential Revision: D41453720
      
      Pulled By: anand1976
      
      fbshipit-source-id: 47a23d4b6539ec6a9a09c9e69c026f7c8b10afa7
      3ff6da6b
    • P
      Add a SecondaryCache::InsertSaved() API, use in CacheDumper impl (#10945) · e079d562
      Peter Dillinger 提交于
      Summary:
      Can simplify some ugly code in cache_dump_load_impl.cc by having an API in SecondaryCache that can directly consume persisted data.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10945
      
      Test Plan: existing tests for CacheDumper, added basic unit test
      
      Reviewed By: anand1976
      
      Differential Revision: D41231497
      
      Pulled By: pdillinger
      
      fbshipit-source-id: b8ec993ef7d3e7efd68aae8602fd3f858da58068
      e079d562
    • A
      Fix CompactionIterator flag for penultimate level output (#10967) · 097f9f44
      Andrew Kryczka 提交于
      Summary:
      We were not resetting it in non-debug mode so it could be true once and then stay true for future keys where it should be false. This PR adds the reset logic.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10967
      
      Test Plan:
      - built `db_bench` with DEBUG_LEVEL=0
      - ran benchmark: `TEST_TMPDIR=/dev/shm/prefix ./db_bench -benchmarks=fillrandom -compaction_style=1 -preserve_internal_time_seconds=100 -preclude_last_level_data_seconds=10 -write_buffer_size=1048576 -target_file_size_base=1048576 -subcompactions=8 -duration=120`
      - compared "output_to_penultimate_level: X bytes + last: Y bytes" lines in LOG output
        - Before this fix, Y was always zero
        - After this fix, Y gradually increased throughout the benchmark
      
      Reviewed By: riversand963
      
      Differential Revision: D41417726
      
      Pulled By: ajkr
      
      fbshipit-source-id: ace1e9a289e751a5b0c2fbaa8addd4eda5525329
      097f9f44
    • P
      Observe and warn about misconfigured HyperClockCache (#10965) · 3182beef
      Peter Dillinger 提交于
      Summary:
      Background. One of the core risks of chosing HyperClockCache is ending up with degraded performance if estimated_entry_charge is very significantly wrong. Too low leads to under-utilized hash table, which wastes a bit of (tracked) memory and likely increases access times due to larger working set size (more TLB misses). Too high leads to fully populated hash table (at some limit with reasonable lookup performance) and not being able to cache as many objects as the memory limit would allow. In either case, performance degradation is graceful/continuous but can be quite significant. For example, cutting block size in half without updating estimated_entry_charge could lead to a large portion of configured block cache memory (up to roughly 1/3) going unused.
      
      Fix. This change adds a mechanism through which the DB periodically probes the block cache(s) for "problems" to report, and adds diagnostics to the HyperClockCache for bad estimated_entry_charge. The periodic probing is currently done with DumpStats / stats_dump_period_sec, and diagnostics reported to info_log (normally LOG file).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10965
      
      Test Plan:
      unit test included. Doesn't cover all the implemented subtleties of reporting, but ensures basics of when to report or not.
      
      Also manual testing with db_bench. Create db with
      ```
      ./db_bench --benchmarks=fillrandom,flush --num=3000000 --disable_wal=1
      ```
      Use and check LOG file for HyperClockCache for various block sizes (used as estimated_entry_charge)
      ```
      ./db_bench --use_existing_db --benchmarks=readrandom --num=3000000 --duration=20 --stats_dump_period_sec=8 --cache_type=hyper_clock_cache -block_size=XXXX
      ```
      Seeing warnings / errors or not as expected.
      
      Reviewed By: anand1976
      
      Differential Revision: D41406932
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 4ca56162b73017e4b9cec2cad74466f49c27a0a7
      3182beef
  11. 18 11月, 2022 2 次提交
  12. 17 11月, 2022 2 次提交
  13. 16 11月, 2022 2 次提交
    • P
      Re-arrange cache.h to prepare for refactoring (#10942) · b55e7035
      Peter Dillinger 提交于
      Summary:
      No material changes to code or comments, just re-arranging things to prepare for a big refactoring, making it easier to what changed. Some specifics:
      * This groups things together in Cache in anticipation of secondary cache features being marked production-ready (vs. experimental).
      * CacheEntryRole will be needed in definition of class Cache, so that has been moved above it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10942
      
      Test Plan: existing tests
      
      Reviewed By: anand1976
      
      Differential Revision: D41205509
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 3f2559ab1651c758918dc97056951fa2b5eb0348
      b55e7035
    • L
      Support using GetMergeOperands for verification with wide columns (#10952) · b644baa1
      Levi Tamasi 提交于
      Summary:
      With the recent changes, `GetMergeOperands` is now supported for wide-column entities as well, so we can use it for verification purposes in the non-batched stress tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10952
      
      Test Plan: Ran a simple non-batched ops blackbox crash test.
      
      Reviewed By: riversand963
      
      Differential Revision: D41292114
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 70b4c756a4a1fecb445c16c7096aad805a51203c
      b644baa1
  14. 15 11月, 2022 3 次提交
    • A
      Fix db_stress failure in async_io in FilePrefetchBuffer (#10949) · 1562524e
      Akanksha Mahajan 提交于
      Summary:
      Fix db_stress failure in async_io in FilePrefetchBuffer.
      
      From the logs, assertion was caused when
      - prev_offset_ = offset but somehow prev_len != 0 and explicit_prefetch_submitted_ = true. That scenario is when we send async request to prefetch buffer during seek but in second seek that data is found in cache. prev_offset_ and prev_len_ get updated but we were not setting explicit_prefetch_submitted_ = false because of which buffers were getting out of sync.
      It's possible a read by another thread might have loaded the block into the cache in the meantime.
      
      Particular assertion example:
      ```
      prev_offset: 0, prev_len_: 8097 , offset: 0, length: 8097, actual_length: 8097 , actual_offset: 0 ,
      curr_: 0, bufs_[curr_].offset_: 4096 ,bufs_[curr_].CurrentSize(): 48541 , async_len_to_read: 278528, bufs_[curr_].async_in_progress_: false
      second: 1, bufs_[second].offset_: 282624 ,bufs_[second].CurrentSize(): 0, async_len_to_read: 262144 ,bufs_[second].async_in_progress_: true ,
      explicit_prefetch_submitted_: true , copy_to_third_buffer: false
      ```
      As we can see curr_ was expected to read 278528 but it read 48541. Also buffers are out of sync.
      Also `explicit_prefetch_submitted_` is set true but prev_len not 0.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10949
      
      Test Plan:
      - Ran db_bench for regression to make sure there is no regression;
      - Ran db_stress failing without this fix,
      - Ran build-linux-mini-crashtest 7- 8 times locally + CircleCI
      
      Reviewed By: anand1976
      
      Differential Revision: D41257786
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 1d100f94f8c06bbbe4cc76ca27f1bbc820c2494f
      1562524e
    • X
      Fix broken dependency: update zlib from 1.2.12 to 1.2.13 (#10833) · 0993c922
      xiaochenfan 提交于
      Summary:
      zlib(https://zlib.net/) has released v1.2.13.
      
      1.2.12 is no longer available for downloading and Makefile for rocksdb will be broken due to can't find the source .tar.gz.
      
      https://nvd.nist.gov/vuln/detail/CVE-2022-37434
      
      This pr update the version number and the shasum of new .tar.gz file. (1.2.13)
      
      Fixes https://github.com/facebook/rocksdb/issues/10876
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10833
      
      Reviewed By: hx235
      
      Differential Revision: D40575954
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3e560e453ddf58d045214fc4e64f83bef91f22e5
      0993c922
    • A
      Update unit test to avoid timeout (#10950) · 85154375
      akankshamahajan 提交于
      Summary:
      Update unit test to avoid timeout
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10950
      
      Reviewed By: hx235
      
      Differential Revision: D41258892
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: cbfe94da63e9e54544a307845deb79ba42458301
      85154375
  15. 14 11月, 2022 1 次提交
  16. 12 11月, 2022 3 次提交
    • P
      Don't attempt to use SecondaryCache on block_cache_compressed (#10944) · f321e8fc
      Peter Dillinger 提交于
      Summary:
      Compressed block cache depends on reading the block compression marker beyond the payload block size. Only the payload bytes were being saved and loaded from SecondaryCache -> boom!
      
      This removes some unnecessary code attempting to combine these two competing features. Note that BlockContents was previously used for block-based filter in block cache, but that support has been removed.
      
      Also marking block_cache_compressed as deprecated in this commit as we expect it to be replaced with SecondaryCache.
      
      This problem was discovered during refactoring but didn't want to combine bug fix with that refactoring.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10944
      
      Test Plan: test added that fails on base revision (at least with ASAN)
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D41205578
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1b29d36c7a6552355ac6511fcdc67038ef4af29f
      f321e8fc
    • L
      Support Merge for wide-column entities in the compaction logic (#10946) · 5e894705
      Levi Tamasi 提交于
      Summary:
      The patch extends the compaction logic to handle `Merge`s in conjunction with wide-column entities. As usual, the merge operation is applied to the anonymous default column, and any other columns are unaffected.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10946
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D41233722
      
      Pulled By: ltamasi
      
      fbshipit-source-id: dfd9b1362222f01bafcecb139eb48480eb279fed
      5e894705
    • A
      Fix async_io regression in scans (#10939) · d1aca4a5
      akankshamahajan 提交于
      Summary:
      Fix async_io regression in scans due to incorrect check which was causing the valid data in buffer to be cleared during seek.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10939
      
      Test Plan:
      - stress tests  export CRASH_TEST_EXT_ARGS="--async_io=1"
          make crash_test -j32
      - Ran db_bench command which was caught the regression:
      ./db_bench --db=/rocksdb_async_io_testing/prefix_scan --disable_wal=1 --use_existing_db=true --benchmarks="seekrandom" -key_size=32 -value_size=512 -num=50000000 -use_direct_reads=false -seek_nexts=963 -duration=30 -ops_between_duration_checks=1 --async_io=true --compaction_readahead_size=4194304 --log_readahead_size=0 --blob_compaction_readahead_size=0 --initial_auto_readahead_size=65536 --num_file_reads_for_auto_readahead=0 --max_auto_readahead_size=524288
      
      seekrandom   :    3777.415 micros/op 264 ops/sec 30.000 seconds 7942 operations;  132.3 MB/s (7942 of 7942 found)
      
      Reviewed By: anand1976
      
      Differential Revision: D41173899
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 2d75b06457d65b1851c92382565d9c3fac329dfe
      d1aca4a5
  17. 11 11月, 2022 2 次提交
    • L
      Support Merge with wide-column entities in iterator (#10941) · dbc4101b
      Levi Tamasi 提交于
      Summary:
      The patch adds `Merge` support for wide-column entities in `DBIter`. As before, the `Merge` operation is applied to the default column of the entity; any other columns are unchanged. As a small cleanup, the PR also changes the signature of `DBIter::Merge` to simply return a boolean instead of the `Merge` operation's `Status` since the actual `Status` is already stored in a member variable.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10941
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D41195471
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 362cf555897296e252c3de5ddfbd569ef34f85ef
      dbc4101b
    • L
      Refactor MergeHelper::MergeUntil a bit (#10943) · 9460d4b7
      Levi Tamasi 提交于
      Summary:
      The patch untangles some nested ifs in `MergeHelper::MergeUntil`. This will come in handy when extending the compaction logic to support `Merge` for wide-column entities, and also enables us to eliminate some repeated branching on value type and to decrease the scope of some variables.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10943
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D41201946
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 890bd3d4e31cdccadca614489a94686d76485ba9
      9460d4b7
  18. 10 11月, 2022 1 次提交
    • L
      Revisit the interface of MergeHelper::TimedFullMerge(WithEntity) (#10932) · 2ea10952
      Levi Tamasi 提交于
      Summary:
      The patch refines/reworks `MergeHelper::TimedFullMerge(WithEntity)`
      a bit in two ways. First, it eliminates the recently introduced `TimedFullMerge`
      overload, which makes the responsibilities clearer by making sure the query
      result (`value` for `Get`, `columns` for `GetEntity`) is set uniformly in
      `SaveValue` and `GetContext`. Second, it changes the interface of
      `TimedFullMergeWithEntity` so it exposes its result in a serialized form; this
      is a more decoupled design which will come in handy when adding support
      for `Merge` with wide-column entities to `DBIter`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10932
      
      Test Plan: `make check`
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D41129399
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 69d8da358c77d4fc7e8c40f4dafc2c129a710677
      2ea10952