1. 02 9月, 2023 1 次提交
  2. 01 9月, 2023 1 次提交
    • J
      Minor refactor on LDB command for wide column support and release note (#11777) · 47be3fff
      Jay Huh 提交于
      Summary:
      As mentioned in https://github.com/facebook/rocksdb/issues/11754 , refactor to clean up some nearly identical logic. This PR changes the existing debugging string format of Scan command as the following.
      
      ```
      ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan --hex
      ```
      
      Before
      ```
      0x6669727374 : :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
      0x7365636F6E64 : 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
      0x7468697264 : 0x62617A
      ```
      After
      ```
      0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
      0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
      0x7468697264 ==> 0x62617A
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11777
      
      Test Plan:
      ```
      ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ dump
      first ==> :hello attr_name1:foo attr_name2:bar
      second ==> attr_one:two attr_three:four
      third ==> baz
      Keys in range: 3
      
      ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan
      first ==> :hello attr_name1:foo attr_name2:bar
      second ==> attr_one:two attr_three:four
      third ==> baz
      
      ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ dump --hex
      0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
      0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
      0x7468697264 ==> 0x62617A
      Keys in range: 3
      
      ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan --hex
      0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
      0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
      0x7468697264 ==> 0x62617A
      ```
      
      Reviewed By: jowlyzhang
      
      Differential Revision: D48876755
      
      Pulled By: jaykorean
      
      fbshipit-source-id: b1c608a810fe038999ac528b690d398abf5f21d7
      47be3fff
  3. 31 8月, 2023 6 次提交
    • P
      Log host name (#11776) · 83eb7b8c
      Peter Dillinger 提交于
      Summary:
      ... in info_log. Becoming more important with disaggregated storage.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11776
      
      Test Plan: manual
      
      Reviewed By: jaykorean
      
      Differential Revision: D48849471
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 9a8fd8b2564a4f133526ecd7c1414cb667e4ba54
      83eb7b8c
    • H
      Change compaction_readahead_size default value to 2MB (#11762) · 05daa123
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      After https://github.com/facebook/rocksdb/pull/11631, we rely on `compaction_readahead_size` for how much to read ahead for compaction read under non-direct IO case. https://github.com/facebook/rocksdb/pull/11658 therefore also sanitized 0 `compaction_readahead_size` to 2MB under non-direct IO, which is consistent with the existing sanitization with direct IO.
      
      However, this makes disabling compaction readahead impossible as well as add one more scenario to the inconsistent effects between `Options.compaction_readahead_size=0` during DB open and `SetDBOptions("compaction_readahead_size", "0")` .
      - `SetDBOptions("compaction_readahead_size", "0")` will disable compaction readahead as its logic never goes through sanitization above while `Options.compaction_readahead_size=0` will go through sanitization.
      
      Therefore we decided to do this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11762
      
      Test Plan: Modified existing UTs to cover this PR
      
      Reviewed By: ajkr
      
      Differential Revision: D48759560
      
      Pulled By: hx235
      
      fbshipit-source-id: b3f85e58bda362a6fa1dc26bd8a87aa0e171af79
      05daa123
    • Y
      Add UDT support in SstFileDumper (#11757) · fc58c7c6
      Yu Zhang 提交于
      Summary:
      For a SST file that uses user-defined timestamp aware comparators, if a lower or upper bound is set, sst_dump tool doesn't handle it well. This PR adds support for that. While working on this `MaybeAddTimestampsToRange` is moved to the udt_util.h file to be shared.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11757
      
      Test Plan:
      make all check
      for changes in db_impl.cc and db_impl_compaction_flush.cc
      
      for changes in sst_file_dumper.cc, I manually tested this change handles specifying bounds for UDT use cases. It probably should have a unit test file eventually.
      
      Reviewed By: ltamasi
      
      Differential Revision: D48668048
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 1560465f40e44668d6d82a7439fe9012be0e74a8
      fc58c7c6
    • J
      Wide Column support in ldb (#11754) · ea9a5b29
      Jay Huh 提交于
      Summary:
      wide_columns can now be pretty-printed in the following commands
      - `./ldb dump_wal`
      - `./ldb dump`
      - `./ldb idump`
      - `./ldb dump_live_files`
      - `./ldb scan`
      - `./sst_dump --command=scan`
      
      There are opportunities to refactor to reduce some nearly identical code. This PR is initial change to add wide column support in `ldb` and `sst_dump` tool. More PRs to come for the refactor.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11754
      
      Test Plan:
      **New Tests added**
      - `WideColumnsHelperTest::DumpWideColumns`
      - `WideColumnsHelperTest::DumpSliceAsWideColumns`
      
      **Changes added to existing tests**
      - `ExternalSSTFileTest::BasicMixed` added to cover mixed case (This test should have been added in https://github.com/facebook/rocksdb/issues/11688). This test does not verify the ldb or sst_dump output. This test was used to create test SST files having some rows with wide columns and some without and the generated SST files were used to manually test sst_dump_tool.
      - `createSST()` in `sst_dump_test` now takes `wide_column_one_in` to add wide column value in SST
      
      **dump_wal**
      ```
      ./ldb dump_wal --walfile=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/000004.log --print_value --header
      ```
      ```
      Sequence,Count,ByteSize,Physical Offset,Key(s) : value
      1,1,59,0,PUT_ENTITY(0) : 0x:0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
      2,1,34,42,PUT_ENTITY(0) : 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
      3,1,17,7d,PUT(0) : 0x7468697264 : 0x62617A
      ```
      
      **idump**
      ```
      ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ idump
      ```
      ```
      'first' seq:1, type:22 => :hello attr_name1:foo attr_name2:bar
      'second' seq:2, type:22 => attr_one:two attr_three:four
      'third' seq:3, type:1 => baz
      Internal keys in range: 3
      ```
      
      **SST Dump from dump_live_files**
      ```
      ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ compact
      ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ dump_live_files
      ```
      ```
      ...
      ==============================
      SST Files
      ==============================
      /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst level:1
      ------------------------------
      Process /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst
      Sst file format: block-based
      'first' seq:0, type:22 => :hello attr_name1:foo attr_name2:bar
      'second' seq:0, type:22 => attr_one:two attr_three:four
      'third' seq:0, type:1 => baz
      ...
      ```
      
      **dump**
      ```
      ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ dump
      ```
      ```
      first ==> :hello attr_name1:foo attr_name2:bar
      second ==> attr_one:two attr_three:four
      third ==> baz
      Keys in range: 3
      ```
      
      **scan**
      ```
      ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ scan
      ```
      ```
      first : :hello attr_name1:foo attr_name2:bar
      second : attr_one:two attr_three:four
      third : baz
      ```
      
      **sst_dump**
      ```
      ./sst_dump --file=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst --command=scan
      ```
      ```
      options.env is 0x7ff54b296000
      Process /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst
      Sst file format: block-based
      from [] to []
      'first' seq:0, type:22 => :hello attr_name1:foo attr_name2:bar
      'second' seq:0, type:22 => attr_one:two attr_three:four
      'third' seq:0, type:1 => baz
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D48837999
      
      Pulled By: jaykorean
      
      fbshipit-source-id: b0280f0589d2b9716bb9b50530ffcabb397d140f
      ea9a5b29
    • H
      Revert "Clarify comment about compaction_readahead_size's sanitizatio… (#11773) · c073c2ed
      Hui Xiao 提交于
      Summary:
      …n change (https://github.com/facebook/rocksdb/issues/11755)"
      
      This reverts commit 45131659.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11773
      
      Reviewed By: ajkr
      
      Differential Revision: D48832320
      
      Pulled By: hx235
      
      fbshipit-source-id: 96cef26a885134360766a83505f6717598eac6a9
      c073c2ed
    • Y
      Increase full_history_ts_low when flush happens during recovery (#11774) · 4234a6a3
      Yu Zhang 提交于
      Summary:
      This PR adds a missing piece for the UDT in memtable only feature, which is to automatically increase `full_history_ts_low` when flush happens during recovery.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11774
      
      Test Plan:
      Added unit test
      make all check
      
      Reviewed By: ltamasi
      
      Differential Revision: D48799109
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: fd681ed66d9d40904ca2c919b2618eb692686035
      4234a6a3
  4. 30 8月, 2023 4 次提交
    • J
      remove a sub-condition that is always true (#11746) · c1e6ffc4
      jsteemann 提交于
      Summary:
      the value of `done` is always false here, so the sub-condition `!done` will always be true and the check can be removed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11746
      
      Reviewed By: anand1976
      
      Differential Revision: D48656845
      
      Pulled By: ajkr
      
      fbshipit-source-id: 523ba3d07b3af7880c8c8ccb20442fd7c0f49417
      c1e6ffc4
    • A
      Add SystemClock::TimedWait() function (#11753) · e373685d
      Andrew Kryczka 提交于
      Summary:
      Having a synthetic implementation of `TimedWait()` in `SystemClock` will allow us to add `SyncPoint`s while mutex is released, which was previously impossible since the lock was released and reacquired all within `pthread_cond_timedwait()`. Additionally, integrating `TimedWait()` with `MockSystemClock` allows us to cleanup some workarounds in the test code. In this PR I only cleaned up the `GenericRateLimiter` test workaround.
      
      This is related to the intended follow-up mentioned in https://github.com/facebook/rocksdb/issues/7101's description. There are a couple differences:
      
      (1) This PR does not include removing the particular workaround that initially motivated it. Actually, the `Timer` class uses `InstrumentedCondVar`, so the interface introduced here is inadequate to remove that workaround. On the bright side, the interface introduced in this PR can be changed as needed since it can neither be used nor extended externally, due to using forward-declared `port::CondVar*` in the interface.
      (2) This PR only makes the change in `SystemClock` not `Env`. Older revisions of this PR included `Env::TimedWait()` and `SpecialEnv::TimedWait()`; however, since they were unused it probably makes sense to defer adding them until when they are needed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11753
      
      Reviewed By: pdillinger
      
      Differential Revision: D48654995
      
      Pulled By: ajkr
      
      fbshipit-source-id: 15e19f2454b64d4ec7f50e328691c66ca9911122
      e373685d
    • J
      avoid find() -> insert() sequence (#11743) · 0b8b17a9
      jsteemann 提交于
      Summary:
      when a key is recorded for locking in a pessimistic transaction, the key is first looked up in a map, and then inserted into the map if it was not already contained.
      this can be simplified to an unconditional insert. in the ideal case that all keys are unique, this saves all the find() operations.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11743
      
      Reviewed By: anand1976
      
      Differential Revision: D48656798
      
      Pulled By: ajkr
      
      fbshipit-source-id: d0150de2db757e0c05e1797cfc24380790c71276
      0b8b17a9
    • Y
      Removing some checks for UDT in memtable only feature (#11732) · ecbeb305
      Yu Zhang 提交于
      Summary:
      The user-defined timestamps feature only enforces that for the same key, user-defined timestamps should be non-decreasing. For the user-defined timestamps in memtable only feature, during flush, we check the user-defined timestamps in each memtable to examine if the data is considered expired with regard to `full_history_ts_low`. In this process, it's assuming that a newer memtable should not have smaller user-defined timestamps than an older memtable. This check however is enforcing ordering of user-defined timestamps across keys, as apposed to the vanilla UDT feature, that only enforce ordering of user-defined timestamps for the same key.
      
      This more strict user-defined timestamp ordering requirement could be an issue for secondary instances where commits can be out of order. And after thinking more about it, this requirement is really an overkill to keep the invariants of `full_history_ts_low` which are:
      
      1) users cannot read below `full_history_ts_low`
      2) users cannot write at or below `full_history_ts_low`
      3) `full_history_ts_low` can only be increasing
      
      As long as RocksDB enforces these 3 checks, we can prohibit inconsistent read that returns a different value. And these three checks are covered in existing APIs.
      
      So this PR removes the extra checks in the UDT in memtable only feature that requires user-defined timestamps to be non decreasing across keys.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11732
      
      Reviewed By: ltamasi
      
      Differential Revision: D48541466
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 95453c6e391cbd511c0feab05f0b11c312d17186
      ecbeb305
  5. 29 8月, 2023 2 次提交
  6. 26 8月, 2023 3 次提交
    • J
      remove an unused typedef (#11286) · ba597514
      Jan 提交于
      Summary:
      `VersionBuilderMap` type alias definition seem unused.
      If this PR can be compiled fine then the alias is probably not needed anymore.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11286
      
      Reviewed By: jaykorean
      
      Differential Revision: D48656747
      
      Pulled By: ajkr
      
      fbshipit-source-id: ac8554922aead7dc3d24fe7e6544a4622578c514
      ba597514
    • R
      Del `(object)` from 200 inc... · 38e9e690
      Richard Barnes 提交于
      Del `(object)` from 200 inc instagram-server/distillery/slipstream/thrift_models/StoryFeedMediaSticker/ttypes.py
      
      Summary: Python3 makes the use of `(object)` in class inheritance unnecessary. Let's modernize our code by eliminating this.
      
      Reviewed By: itamaro
      
      Differential Revision: D48673915
      
      fbshipit-source-id: a1a6ae8572271eb2898b748c8216ea68e362f06a
      38e9e690
    • A
      Fix seg fault in auto_readahead_size during IOError (#11761) · 6cbb1046
      akankshamahajan 提交于
      Summary:
      Fix seg fault in auto_readahead_size
      ```
      db_stress:
      internal_repo_rocksdb/repo/table/block_based/partitioned_index_iterator.h:70: virtual rocksdb::IndexValue rocksdb::PartitionedIndexIterator::value() const: Assertion `Valid()' failed.
      ```
      
      During seek, after calculating readahead_size, db_stress can inject IOError resulting in failure to index_iter_->Seek and making index_iter_ invalid.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11761
      
      Test Plan: Reproducible locally and passed with this fix
      
      Reviewed By: anand1976
      
      Differential Revision: D48696248
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 2be43bf56ad0fc2f95f9093c19c9a1b15a716091
      6cbb1046
  7. 25 8月, 2023 3 次提交
  8. 24 8月, 2023 2 次提交
  9. 23 8月, 2023 3 次提交
    • A
      Add C API for WaitForCompact (#11737) · 2b6bcfe5
      Alexander Bulimov 提交于
      Summary:
      Add a bunch of C API functions to expose new `WaitForCompact` function and related options.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11737
      
      Test Plan: unit tests
      
      Reviewed By: jaykorean
      
      Differential Revision: D48568239
      
      Pulled By: abulimov
      
      fbshipit-source-id: 1ff35972d7abacd7e1e17fe2ada1e20cdc88d8de
      2b6bcfe5
    • C
      Reverse sort order in dedup to enable iter checking in callback (#11725) · 13035735
      chuhao zeng 提交于
      Summary:
      Fix https://github.com/facebook/rocksdb/issues/6470
      
      Ensure TransactionLogIter being initialized correctly with SYNC_POINT API when calling `GetSortedWALFiles`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11725
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D48529411
      
      Pulled By: ajkr
      
      fbshipit-source-id: 970ca1a6259ed996c6d87f7fcd40f95acf441517
      13035735
    • C
      Do not drop unsynced data during reopen in stress test (#11731) · 5e0584bd
      Changyu Bi 提交于
      Summary:
      Currently the stress test does not support restoring expected state (to a specific sequence number) when there is unsynced data loss during the reopen phase. This causes a few internal stress test failure with errors like inconsistent value. This PR disables dropping unsynced data during reopen to avoid failures due to this issue. We can re-enable later after we decide to support unsynced data loss during DB reopen in stress test.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11731
      
      Test Plan:
      * Running this test a few times can fail for inconsistent value before this change
      ```
      ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_bits=20.57166126835524 --bottommost_compression_type=disable --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_style=1 --compaction_ttl=100 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --enable_thread_tracking=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=3 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=6 --index_type=3 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=1000000 --log2_keys_per_lock=10 --long_running_snapshots=1 --manual_wal_flush_one_in=1000000 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_max_range_deletions=100 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=5 --open_write_fault_one_in=0 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=10 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=524288 --readpercent=50 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=0 --secondary_cache_fault_one_in=32 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --subcompactions=3 --sync=0 --sync_fault_injection=1 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35```
      
      Reviewed By: hx235
      
      Differential Revision: D48537494
      
      Pulled By: cbi42
      
      fbshipit-source-id: ddae21b9bb6ee8d67229121f58513e95f7ef6d8d
      5e0584bd
  10. 22 8月, 2023 5 次提交
    • Y
      Try to use a db's OPTIONS file for some ldb commands (#11721) · 2a9f3b6c
      Yu Zhang 提交于
      Summary:
      For some ldb commands that doesn't need to open the DB, it's still useful to use the DB's existing OPTIONS file if it's available.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11721
      
      Reviewed By: pdillinger
      
      Differential Revision: D48485540
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 2d2db837523044066f1a2c4b59a5c03f6cd35e6b
      2a9f3b6c
    • A
      Update HISTORY.md and version.h for 8.6 (#11728) · 4b535207
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11728
      
      Reviewed By: jaykorean, jowlyzhang
      
      Differential Revision: D48527100
      
      Pulled By: anand1976
      
      fbshipit-source-id: c48baa44e538fb6bfd3fe7f19046746d3540763f
      4b535207
    • J
      Replace existing waitforcompaction with new WaitForCompact API in db_bench_tool (#11727) · 4fa2c017
      Jay Huh 提交于
      Summary:
      As the new API to wait for compaction is available (https://github.com/facebook/rocksdb/issues/11436), we can now replace the existing logic of waiting in db_bench_tool with the new API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11727
      
      Test Plan:
      ```
      ./db_bench --benchmarks="fillrandom,compactall,waitforcompaction,readrandom"
      ```
      **Before change**
      ```
      Set seed to 1692635571470041 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 8.6.0
      Date:       Mon Aug 21 09:33:40 2023
      CPU:        80 * Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
      CPUCache:   28160 KB
      Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      WARNING: Optimization is disabled: benchmarks unnecessarily slow
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      fillrandom   :      51.826 micros/op 19295 ops/sec 51.826 seconds 1000000 operations;    2.1 MB/s
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      readrandom   :      39.042 micros/op 25613 ops/sec 39.042 seconds 1000000 operations;    1.8 MB/s (632886 of 1000000 found)
      ```
      **After change**
      ```
      Set seed to 1692636574431745 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      RocksDB:    version 8.6.0
      Date:       Mon Aug 21 09:49:34 2023
      CPU:        80 * Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
      CPUCache:   28160 KB
      Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      WARNING: Optimization is disabled: benchmarks unnecessarily slow
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      Integrated BlobDB: blob cache disabled
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      fillrandom   :      51.271 micros/op 19504 ops/sec 51.271 seconds 1000000 operations;    2.2 MB/s
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started
      waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished with status (OK)
      DB path: [/tmp/rocksdbtest-226125/dbbench]
      readrandom   :      39.264 micros/op 25468 ops/sec 39.264 seconds 1000000 operations;    1.8 MB/s (632921 of 1000000 found)
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D48524667
      
      Pulled By: jaykorean
      
      fbshipit-source-id: 1052a15b2ed79a35165ec4d9998d0454b2552ef4
      4fa2c017
    • Y
      Add unit test for default temperature (#11722) · 03a74411
      Yu Zhang 提交于
      Summary:
      This piggy back the existing last level file temperature statistics test to test the default temperature becoming effective.
      
      While adding this unit test, I found that the approach to swap out and use default temperature in `VersionBuilder::LoadTableHandlers` will miss the L0 files created from flush, and only work for existing SST files, SST files created by compaction. So this PR moves that logic to `TableCache::GetTableReader`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11722
      
      Test Plan:
      ```
      ./db_test2 --gtest_filter="*LastLevelStatistics*"
      make all check
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D48489171
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: ac29f7d484916f3218729594c5bb35c4f2979ac2
      03a74411
    • L
      Circleci macos sunset (#11633) · a9770b18
      Levi Tamasi 提交于
      Summary:
      [draft] this PR is created in order to test CI changes
      Closes: https://github.com/facebook/rocksdb/pull/11543
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11633
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D48525552
      
      Pulled By: cbi42
      
      fbshipit-source-id: 758d57f248304213228af459789459cc2f0bf419
      a9770b18
  11. 19 8月, 2023 6 次提交
    • H
      Improve PrefetchTest.Basic with explicit flush and file num variable (#11720) · f53018c0
      Hui Xiao 提交于
      Summary:
      **Context/Summary:** as title, should be harmless. And it's a guessed fix to https://github.com/facebook/rocksdb/issues/11717 while no repro has obtained on my end yet.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11720
      
      Test Plan: existing tests
      
      Reviewed By: cbi42
      
      Differential Revision: D48475661
      
      Pulled By: hx235
      
      fbshipit-source-id: 7c7390319f094c540e703fe2e78a8d601b7a894b
      f53018c0
    • A
      Implement trimming of readhead size when upper bound is specified (#11684) · f65a0379
      akankshamahajan 提交于
      Summary:
      Implement trimming of readahead_size under a new option ReadOptions.auto_readahead_size. It'll trim the readahead_size during prefetching upto iterate_upper_bound offset only when ReadOptions.iterate_upper_bound is set, therefore reducing the prefetching of data beyond upper_bound.
      It's enabled for both implicit auto readahead size and when ReadOptions.readahead_size is specified and for sync and async_io.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11684
      
      Test Plan: Added new unit test
      
      Reviewed By: anand1976
      
      Differential Revision: D48479723
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 2b1703579caf779105e836b580866ffd7db076fc
      f65a0379
    • C
      Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666) · c2aad555
      Changyu Bi 提交于
      Summary:
      Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
      * User can set CompressionOptions::checksum to true to enable this feature.
      * We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
      * Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
      * Error handling during decompression to propagate error message from ZSTD.
      * Updated microbench to test read performance impact.
      
      About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
      
      Test Plan:
      * Existing unit tests for zstd compression
      * Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
      * Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
      * Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
      * Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
      * Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
      * Microbench:
      ```
      TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
      
      Min out of 100 runs:
      Main:
      10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
      
      After this PR, checksum=false
      10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
      
      After this PR, checksum=true
      10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
      ```
      * db_bench:
      ```
      Write perf
      TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
      
      [FillSeq checksum=0]
      fillseq [AVG    10 runs] : 281635 (± 31711) ops/sec;   31.2 (± 3.5) MB/sec
      fillseq [MEDIAN 10 runs] : 294027 ops/sec;   32.5 MB/sec
      
      [FillSeq checksum=1]
      fillseq [AVG    10 runs] : 286961 (± 34700) ops/sec;   31.7 (± 3.8) MB/sec
      fillseq [MEDIAN 10 runs] : 283278 ops/sec;   31.3 MB/sec
      
      Read perf
      TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
      
      [Readrandom checksum=1]
      readrandom [AVG    20 runs] : 360928 (± 3579) ops/sec;    4.0 (± 0.0) MB/sec
      readrandom [MEDIAN 20 runs] : 362468 ops/sec;    4.0 MB/sec
      
      [Readrandom checksum=0]
      readrandom [AVG    20 runs] : 380365 (± 2384) ops/sec;    4.2 (± 0.0) MB/sec
      readrandom [MEDIAN 20 runs] : 379800 ops/sec;    4.2 MB/sec
      
      Compression
      TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
      
      checksum=1
      compress [AVG    20 runs] : 54074 (± 634) ops/sec;  211.2 (± 2.5) MB/sec
      compress [MEDIAN 20 runs] : 54396 ops/sec;  212.5 MB/sec
      
      checksum=0
      compress [AVG    20 runs] : 54598 (± 393) ops/sec;  213.3 (± 1.5) MB/sec
      compress [MEDIAN 20 runs] : 54592 ops/sec;  213.3 MB/sec
      
      Decompression:
      TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
      
      checksum = 0
      uncompress [AVG    20 runs] : 167499 (± 962) ops/sec;  654.3 (± 3.8) MB/sec
      uncompress [MEDIAN 20 runs] : 167210 ops/sec;  653.2 MB/sec
      checksum = 1
      uncompress [AVG    20 runs] : 167980 (± 924) ops/sec;  656.2 (± 3.6) MB/sec
      uncompress [MEDIAN 20 runs] : 168465 ops/sec;  658.1 MB/sec
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D48019378
      
      Pulled By: cbi42
      
      fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
      c2aad555
    • J
      Timeout in microsecond option in WaitForCompactOptions (#11711) · 0fa0c97d
      Jay Huh 提交于
      Summary:
      While it's rare, we may run into a scenario where `WaitForCompact()` waits for background jobs indefinitely. For example, not enough space error will add the job back to the queue while WaitForCompact() waits for _all jobs_ including the jobs that are in the queue to be completed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11711
      
      Test Plan:
      `DBCompactionWaitForCompactTest::WaitForCompactToTimeout` added
      `timeout` option added to the variables for all of the existing DBCompactionWaitForCompactTests
      
      Reviewed By: pdillinger, jowlyzhang
      
      Differential Revision: D48416390
      
      Pulled By: jaykorean
      
      fbshipit-source-id: 7b6a12f705ab6c6dfaf8ad736a484ca654a86106
      0fa0c97d
    • A
      Implement a allow cache hits admission policy for the compressed secondary cache (#11713) · a1743e85
      anand76 提交于
      Summary:
      This PR implements a new admission policy for the compressed secondary cache, which includes the functionality of the existing policy, and also admits items evicted from the primary block cache with the hit bit set. Effectively, the new policy works as follows -
      1. When an item is demoted from the primary cache without a hit, a placeholder is inserted in the compressed cache. A second demotion will insert the full entry.
      2. When an item is promoted from the compressed cache to the primary cache for the first time, a placeholder is inserted in the primary. The second promotion inserts the full entry, while erasing it form the compressed cache.
      3. If an item is demoted from the primary cache with the hit bit set, it is immediately inserted in the compressed secondary cache.
      The ```TieredVolatileCacheOptions``` has been updated with a new option, ```adm_policy```, which allows the policy to be selected.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11713
      
      Reviewed By: pdillinger
      
      Differential Revision: D48444512
      
      Pulled By: anand1976
      
      fbshipit-source-id: b4cbf8c169a88097dff08e36e8bc4b3088de1492
      a1743e85
    • H
      Explicitly instantiate MaybeReadBlockAndLoadToCache as well (#11714) · a67ef998
      Han Zhu 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11714
      
      Fixes T161017540.
      
      The staging build starts failing with an undefined symbol error:
      ```
      ld.lld: error: undefined symbol: std::enable_if<rocksdb::ParsedFullFilterBlock::kCacheEntryRole == (rocksdb::CacheEntryRole)13 || true, rocksdb::Status>::type rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, bool, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*, bool) const
      ```
      This is the `MaybeReadBlockAndLoadToCache` function where `TBlocklike = ParsedFullFilterBlock`. The trigger was an FDO profile update D48261413.
      
      `MaybeReadBlockAndLoadToCache` is used in the same translation unit `block_based_table_reader.cc`, and also in another file `partitioned_filter_block.cc`. The later was the file that couldn't find the symbol. It seems after the FDO profile update, `MaybeReadBlockAndLoadToCache` may've got inlined into its caller in `block_based_table_reader.cc`. And with no knowledge of other usages, the symbol got stripped.
      
      Explicitly instantiate the template similar to how `RetrieveBlock` was handled.
      
      Reviewed By: pdillinger, akankshamahajan15
      
      Differential Revision: D48400574
      
      fbshipit-source-id: d4a80999bfb6ce4afa80678444139fcd8ae84aa4
      a67ef998
  12. 18 8月, 2023 2 次提交
    • Y
      Add a per column family default temperature option for accounting (#11708) · 1e77e35d
      Yu Zhang 提交于
      Summary:
      Add a column family option `default_temperature` that will be used for file reading accounting purpose, such as io statistics, for files that don't have an explicitly set temperature.
      
      This options is not a mutable one, changing its value would require a DB restart. This is to avoid the confusion that had the option being a mutable one, the users may expect it to take effect on all files immediately, while in reality, it would only become effective for SST files opened in the future.
      
      This `default_temperature` also just affect accounting during one DB session. It won't be recorded in manifest as the file's temperature and can be different across different DB sessions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11708
      
      Test Plan:
      ```
      make all check
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D48375763
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: eb756696c14a694c6e2a93d2bb6f040563194981
      1e77e35d
    • P
      Clean up some FastRange calls (#11707) · 966be1cc
      Peter Dillinger 提交于
      Summary:
      * JemallocNodumpAllocator was passing a size_t to FastRange32, which could cause compilation errors or warnings (seen with clang)
      * Fixed the order of arguments to match what would be used with modulo operator (%), for clarity.
      
      Fixes https://github.com/facebook/rocksdb/issues/11006
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11707
      
      Test Plan: no functional change, existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D48435149
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e6e8b107ded4eceda37db20df59985c846a2546b
      966be1cc
  13. 17 8月, 2023 2 次提交
    • C
      Delay bottommost level single file compactions (#11701) · d1ff4014
      Changyu Bi 提交于
      Summary:
      For leveled compaction, RocksDB has a special kind of compaction with reason "kBottommmostFiles" that compacts bottommost level files to clear data held by snapshots (more detail in https://github.com/facebook/rocksdb/issues/3009). Such compactions can happen soon after a relevant snapshot is released. For some use cases, a bottommost file may contain only a small amount of keys that can be cleared, so compacting such a file has a high write amp. In addition, these bottommost files may be compacted in compactions with reason other than "kBottommmostFiles" if we wait for some time (so that enough data is ingested to trigger such a compaction). This PR introduces an option `bottommost_file_compaction_delay` to specify the delay of these bottommost level single file compactions.
      
      * The main change is in `VersionStorageInfo::ComputeBottommostFilesMarkedForCompaction()` where we only add a file to `bottommost_files_marked_for_compaction_` if it oldest_snapshot is larger than its non-zero largest_seqno **and** the file is old enough. Note that if a file is not old enough but its largest_seqno is less than oldest_snapshot, we exclude it from the calculation of `bottommost_files_mark_threshold_`. This makes the change simpler, but such a file's eligibility for compaction will only be checked the next time `ComputeBottommostFilesMarkedForCompaction()` is called. This happens when a new Version is created (compaction, flush, SetOptions()...), a new enough snapshot is released (`VersionStorageInfo::UpdateOldestSnapshot()`) or when a compaction is picked and compaction score has to be re-calculated.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11701
      
      Test Plan:
      * Add two unit tests to test when bottommost_file_compaction_delay > 0.
      * Ran crash test with the new option.
      
      Reviewed By: jaykorean, ajkr
      
      Differential Revision: D48331564
      
      Pulled By: cbi42
      
      fbshipit-source-id: c584f3dc5f6354fce3ed65f4c6366dc450b15ba8
      d1ff4014
    • A
      clarify TODO for whitebox disable_wal=1 in db_crashtest.py (#11665) · 0b6ee88d
      Andrew Kryczka 提交于
      Summary:
      See https://github.com/facebook/rocksdb/issues/11613
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11665
      
      Reviewed By: hx235
      
      Differential Revision: D48010507
      
      Pulled By: ajkr
      
      fbshipit-source-id: 65c6d87d2c6ffc9d25f1d17106eae467ec528082
      0b6ee88d