提交 · 48fe921754cb53d2fa258a9260cfe8d78758da27 · kvdb / rocksdb

26 10月, 2022 7 次提交

Run clang format against files under tools/ and db_stress_tool/ (#10868) · 48fe9217

由 sdong 提交于 10月 25, 2022

Summary:
Some lines of .h and .cc files are not properly fomatted. Clear them up with clang format.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10868

Test Plan: Watch existing CI to pass

Reviewed By: ajkr

Differential Revision: D40683485

fbshipit-source-id: 491fbb78b2cdcb948164f306829909ad816d5d0b

48fe9217

Run clang-format on utilities/transactions (#10871) · 95a1935c

由 Yanqin Jin 提交于 10月 25, 2022

Summary:
This PR is the result of running the following command
```
find ./utilities/transactions/ -name '*.cc' -o -name '*.h' -o -name '*.c' -o -name '*.hpp' -o -name '*.cpp' | xargs clang-format -i
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10871

Test Plan: make check

Reviewed By: cbi42

Differential Revision: D40686871

Pulled By: riversand963

fbshipit-source-id: 613738d667ec8f8e13cce4802e0e166d6be52211

95a1935c

Run clang-format on some files in db/db_impl directory (#10869) · 84563a27

由 Yanqin Jin 提交于 10月 25, 2022

Summary:
Run clang-format on some files in db/db_impl/ directory

```
clang-format -i <file>
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10869

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D40685390

Pulled By: riversand963

fbshipit-source-id: 64449ccb21b0d61c5142eb2bcbff828acb45c154

84563a27

Format files under table/ by clang-format (#10852) · 727bad78

由 anand76 提交于 10月 25, 2022

Summary:
Run clang-format on files under the `table` directory.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10852

Reviewed By: ajkr

Differential Revision: D40650732

Pulled By: anand1976

fbshipit-source-id: 2023a958e37fd6274040c5181130284600c9e0ef

727bad78

Improve FragmentTombstones() speed by lazily initializing `seq_set_` (#10848) · 7a959388

由 Changyu Bi 提交于 10月 25, 2022

Summary:
FragmentedRangeTombstoneList has a member variable `seq_set_` that contains the sequence numbers of all range tombstones in a set. The set is constructed in `FragmentTombstones()` and is used only in `FragmentedRangeTombstoneList::ContainsRange()` which only happens during compaction. This PR moves the initialization of `seq_set_` to `FragmentedRangeTombstoneList::ContainsRange()`. This should speed up `FragmentTombstones()` when the range tombstone list is used for read/scan requests. Microbench shows the speed improvement to be ~45%.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10848

Test Plan:
- Existing tests and stress test: `python3 tools/db_crashtest.py whitebox --simple  --verify_iterator_with_expected_state_one_in=5`.
- Microbench: update `range_del_aggregator_bench` to benchmark speed of `FragmentTombstones()`:
```
./range_del_aggregator_bench --num_range_tombstones=1000 --tombstone_start_upper_bound=50000000 --num_runs=10000 --tombstone_width_mean=200 --should_deletes_per_run=100 --use_compaction_range_del_aggregator=true

Before this PR:
=========================
Fragment Tombstones:     270.286 us
AddTombstones:           1.28933 us
ShouldDelete (first):    0.525528 us
ShouldDelete (rest):     0.0797519 us

After this PR: time to fragment tombstones is pushed to AddTombstones() which only happen during compaction.
=========================
Fragment Tombstones:     149.879 us
AddTombstones:           102.131 us
ShouldDelete (first):    0.565871 us
ShouldDelete (rest):     0.0729444 us
```
- db_bench: this should improve speed for fragmenting range tombstones for mutable memtable:
```
./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=100 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=250000 --disable_auto_compactions --max_num_range_tombstones=100000 --finish_after_writes --write_buffer_size=1073741824 --threads=25

Before this PR:
readwhilewriting :      18.301 micros/op 1310445 ops/sec 4.769 seconds 6250000 operations;   28.1 MB/s (41001 of 250000 found)
After this PR:
readwhilewriting :      16.943 micros/op 1439376 ops/sec 4.342 seconds 6250000 operations;   23.8 MB/s (28977 of 250000 found)
```

Reviewed By: ajkr

Differential Revision: D40646227

Pulled By: cbi42

fbshipit-source-id: ea471667edb258f67d01cfd828588e80a89e4083

7a959388

Fix FIFO causing overlapping seqnos in L0 files due to overlapped seqnos... · fc74abb4

由 Hui Xiao 提交于 10月 25, 2022

Fix FIFO causing overlapping seqnos in L0 files due to overlapped seqnos between ingested files and memtable's (#10777)

Summary:
**Context:**
Same as https://github.com/facebook/rocksdb/pull/5958#issue-511150930 but apply the fix to FIFO Compaction case
Repro:
```
COERCE_CONTEXT_SWICH=1 make -j56 db_stress

./db_stress --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=18 --bottommost_compression_type=disable --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=1000 --compaction_pri=3 --open_files=-1 --compaction_style=2 --fifo_allow_compaction=1 --compaction_ttl=0 --compression_max_dict_buffer_bytes=8388607 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zlib --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test0/rocksdb_crashtest_whitebox --db_write_buffer_size=8388608 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=15 --index_type=3 --ingest_external_file_one_in=100 --initial_auto_readahead_size=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --log2_keys_per_lock=10 --long_running_snapshots=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=16384 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=4194304 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --num_levels=1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=1 --reopen=20 --ribbon_starting_level=999 --snapshot_hold_ops=1000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=3 --unpartitioned_pinning=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=0 --use_multiget=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=0 --verify_db_one_in=1000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=524288 --write_dbid_to_manifest=0 --writepercent=35

put or merge error: Corruption: force_consistency_checks(DEBUG): VersionBuilder: L0 file https://github.com/facebook/rocksdb/issues/479 with seqno 23711 29070 vs. file https://github.com/facebook/rocksdb/issues/482 with seqno 27138 29049
```

**Summary:**
FIFO only does intra-L0 compaction in the following four cases. For other cases, FIFO drops data instead of compacting on data, which is irrelevant to the overlapping seqno issue we are solving.
-  [FIFOCompactionPicker::PickSizeCompaction](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L155) when `total size < compaction_options_fifo.max_table_files_size` and `compaction_options_fifo.allow_compaction == true`
   - For this path, we simply reuse the fix in `FindIntraL0Compaction` https://github.com/facebook/rocksdb/pull/5958/files#diff-c261f77d6dd2134333c4a955c311cf4a196a08d3c2bb6ce24fd6801407877c89R56
   - This path was not stress-tested at all. Therefore we covered `fifo.allow_compaction` in stress test to surface the overlapping seqno issue we are fixing here.
- [FIFOCompactionPicker::PickCompactionToWarm](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L313) when `compaction_options_fifo.age_for_warm > 0`
  - For this path, we simply replicate the idea in https://github.com/facebook/rocksdb/pull/5958#issue-511150930 and skip files of largest seqno greater than `earliest_mem_seqno`
  - This path was not stress-tested at all. However covering `age_for_warm` option worths a separate PR to deal with db stress compatibility. Therefore we manually tested this path for this PR
- [FIFOCompactionPicker::CompactRange](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L365) that ends up picking one of the above two compactions
- [CompactionPicker::CompactFiles](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker.cc#L378)
    - Since `SanitizeCompactionInputFiles()` will be called [before](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker.h#L111-L113) `CompactionPicker::CompactFiles` , we simply replicate the idea in https://github.com/facebook/rocksdb/pull/5958#issue-511150930  in `SanitizeCompactionInputFiles()`. To simplify implementation, we return `Stats::Abort()` on encountering seqno-overlapped file when doing compaction to L0 instead of skipping the file and proceed with the compaction.

Some additional clean-up included in this PR:
- Renamed `earliest_memtable_seqno` to `earliest_mem_seqno` for consistent naming
- Added comment about `earliest_memtable_seqno` in related APIs
- Made parameter `earliest_memtable_seqno` constant and required

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10777

Test Plan:
- make check
- New unit test `TEST_P(DBCompactionTestFIFOCheckConsistencyWithParam, FlushAfterIntraL0CompactionWithIngestedFile)`corresponding to the above 4 cases, which will fail accordingly without the fix
- Regular CI stress run on this PR + stress test with aggressive value https://github.com/facebook/rocksdb/pull/10761  and on FIFO compaction only

Reviewed By: ajkr

Differential Revision: D40090485

Pulled By: hx235

fbshipit-source-id: 52624186952ee7109117788741aeeac86b624a4f

fc74abb4

Run format check for *.h and *.cc files under java/ (#10851) · 2a551976

由 sdong 提交于 10月 25, 2022

Summary:
Run format check for .h and .cc files to clean the format

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10851

Test Plan: Watch CI tests to pass

Reviewed By: ajkr

Differential Revision: D40649723

fbshipit-source-id: 62d32cead0b3b8e6540e86d25451bd72642109eb

2a551976

25 10月, 2022 11 次提交

clang format files under monitoring/ (#10857) · de34e719

由 changyubi 提交于 10月 24, 2022

Summary:
Ran find . -iname '*.h' -o -iname '*.cc' | xargs clang-format -i under monitoring/.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10857

Test Plan: existing CI.

Reviewed By: siying

Differential Revision: D40652600

Pulled By: cbi42

fbshipit-source-id: 2af2467c33995b093e07b7512b8c32ed4144968e

de34e719

clang format files under test_util/ (#10855) · aca00006

由 changyubi 提交于 10月 24, 2022

Summary:
Ran `find . -iname '*.h' -o -iname '*.cc' | xargs clang-format -i` under test_util/.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10855

Test Plan: existing CI.

Reviewed By: siying

Differential Revision: D40652583

Pulled By: cbi42

fbshipit-source-id: ed0fbcfe17b6f9ec217a64b80d6d43dfbf1cc34e

aca00006

Run Clang format on file folder (#10860) · 671753c4

由 akankshamahajan 提交于 10月 24, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10860

Test Plan: CircleCI jobs

Reviewed By: anand1976

Differential Revision: D40656236

Pulled By: akankshamahajan15

fbshipit-source-id: 557600db5c2e0ab9b400655336c467307f7136de

671753c4

Run clang format on logging folder (#10861) · 935aae3b

由 akankshamahajan 提交于 10月 24, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10861

Test Plan: CircleCI jobs

Reviewed By: siying

Differential Revision: D40654198

Pulled By: akankshamahajan15

fbshipit-source-id: 787be2575578b3aa3bd985509f96fdb9e02f7ad7

935aae3b

Run clang-format on env/ folder (#10859) · ee3dbdc0

由 akankshamahajan 提交于 10月 24, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10859

Test Plan: CircleCI jobs

Reviewed By: anand1976

Differential Revision: D40653839

Pulled By: akankshamahajan15

fbshipit-source-id: ce75205ee34ee3896a77a807d5c556886de78b01

ee3dbdc0

Fix override error in system_clock.h (#10858) · 0ed1a800

由 akankshamahajan 提交于 10月 24, 2022

Summary:
Fix error
```
 rocksdb/system_clock.h:30:11: error: '~SystemClock' overrides a destructor but is not marked 'override' [-Werror,-Wsuggest-destructor-override]
virtual ~SystemClock() {}
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10858

Test Plan: Ran internally

Reviewed By: siying

Differential Revision: D40652374

Pulled By: akankshamahajan15

fbshipit-source-id: 5dda8ca03ea57d709442c87e23e5fe097d7db672

0ed1a800

clang format files under port/ (#10849) · 7cf27eae

由 sdong 提交于 10月 24, 2022

Summary:
Run "clang-format" against files under port to make it happy.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10849

Test Plan: Watch existing CI to pass.

Reviewed By: anand1976

Differential Revision: D40645839

fbshipit-source-id: 582b4215503223795cf6234af90cc4e8e4eba773

7cf27eae

Run clang-format on utilities/ (except utilities/transactions/) (#10853) · 4d9cb433

由 Levi Tamasi 提交于 10月 24, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10853

Test Plan: `make check`

Reviewed By: siying

Differential Revision: D40651315

Pulled By: ltamasi

fbshipit-source-id: 8b270ff4777a06464be86e376c2a680427866a46

4d9cb433

Update header file to include right copyright (#10854) · 966cd42c

由 akankshamahajan 提交于 10月 24, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10854

Reviewed By: siying

Differential Revision: D40651483

Pulled By: akankshamahajan15

fbshipit-source-id: 95ce53297e9699a34cc80439bc7553f6cc3ac957

966cd42c

Run clang-format on db/blob/ (#10856) · bb5ac1b5

由 Levi Tamasi 提交于 10月 24, 2022

Summary:
Apply the formatting changes suggested by `clang-format`, except
where they would ruin the ASCII art in `blob_log_format.h`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10856

Test Plan: `make check`

Reviewed By: siying

Differential Revision: D40652224

Pulled By: ltamasi

fbshipit-source-id: 8c1f5757b758474ea3e8102a7c5a1cf9e6dc1402

bb5ac1b5

clang format files under include/ (#10850) · b0d9776b

由 sdong 提交于 10月 24, 2022

Summary:
Run clang-format against files under include/

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10850

Test Plan: Watch existing CI to pass.

Reviewed By: ajkr

Differential Revision: D40646158

fbshipit-source-id: 8ce04b107c837630f4000a478d0c871577090263

b0d9776b

24 10月, 2022 2 次提交

Remove range tombstone test code from sst_file_reader (#10847) · deb6a24b

由 Changyu Bi 提交于 10月 23, 2022

Summary:
`#include "db/range_tombstone_fragmenter.h"` seems to break some internal test for 7.8 release. I'm removing it from sst_file_reader.h for now to unblock release. This should be fine as it is only used in a unit test for DeleteRange with timestamp. In addition, it does not seem to be useful to support delete range for sst file writer, since the range tombstone won't cover any key (its sequence number is 0). So maybe we can remove it in the future.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10847

Test Plan: CI.

Reviewed By: akankshamahajan15

Differential Revision: D40620865

Pulled By: cbi42

fbshipit-source-id: be44b2f31e062bff87ed1b8d94482c3f7eaa370c

deb6a24b

Update version.h, HISTORY.md and add branches to compatibility check (#10846) · daceb85c

由 akankshamahajan 提交于 10月 23, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10846

Reviewed By: ajkr

Differential Revision: D40617997

Pulled By: akankshamahajan15

fbshipit-source-id: 4b2d6e85dbca7e73b930c4165869b693d3e4e137

daceb85c

23 10月, 2022 1 次提交

Update HISTORY.md for 7.8 release (#10844) · 9a55e5da

由 akankshamahajan 提交于 10月 22, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10844

Reviewed By: ajkr

Differential Revision: D40592956

Pulled By: akankshamahajan15

fbshipit-source-id: 6656f4bc5faa30fa7882bf44155f7931895590e2

9a55e5da

22 10月, 2022 7 次提交

Allow penultimate level output for the last level only compaction (#10822) · f726d29a

由 Jay Zhuang 提交于 10月 22, 2022

Summary:
Allow the last level only compaction able to output result to penultimate level if the penultimate level is empty. Which will also block the other compaction output to the penultimate level.
(it includes the PR https://github.com/facebook/rocksdb/issues/10829)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10822

Reviewed By: siying

Differential Revision: D40389180

Pulled By: jay-zhuang

fbshipit-source-id: 4e5dcdce307795b5e07b5dd1fa29dd75bb093bad

f726d29a

Use kXXH3 as default checksum (CPU efficiency) (#10778) · 27c9705a

由 Peter Dillinger 提交于 10月 21, 2022

Summary:
Since this has been supported for about a year, I think it's time to make it the default. This should improve CPU efficiency slightly on most hardware.

A current DB performance comparison using buck+clang build:
```
TEST_TMPDIR=/dev/shm ./db_bench -checksum_type={1,4} -benchmarks=fillseq[-X1000] -num=3000000 -disable_wal
```
kXXH3 (+0.2% DB write throughput):
`fillseq [AVG    1000 runs] : 822149 (± 1004) ops/sec;   91.0 (± 0.1) MB/sec`
kCRC32c:
`fillseq [AVG    1000 runs] : 820484 (± 1203) ops/sec;   90.8 (± 0.1) MB/sec`

Micro benchmark comparison:
```
./db_bench --benchmarks=xxh3[-X20],crc32c[-X20]
```
Machine 1, buck+clang build:
`xxh3 [AVG    20 runs] : 3358616 (± 19091) ops/sec; 13119.6 (± 74.6) MB/sec`
`crc32c [AVG    20 runs] : 2578725 (± 7742) ops/sec; 10073.1 (± 30.2) MB/sec`

Machine 2, make+gcc build, DEBUG_LEVEL=0 PORTABLE=0:
`xxh3 [AVG    20 runs] : 6182084 (± 137223) ops/sec; 24148.8 (± 536.0) MB/sec`
`crc32c [AVG    20 runs] : 5032465 (± 42454) ops/sec; 19658.1 (± 165.8) MB/sec`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10778

Test Plan: make check, unit tests updated

Reviewed By: ajkr

Differential Revision: D40112510

Pulled By: pdillinger

fbshipit-source-id: e59a8d50a60346137732f8668ba7cfac93be2b37

27c9705a

Make UserComparatorWrapper not Customizable (#10837) · 5d17297b

由 sdong 提交于 10月 21, 2022

Summary:
Right now UserComparatorWrapper is a Customizable object, although it is not, which introduces some intialization overhead for the object. In some benchmarks, it shows up in CPU profiling. Make it not configurable by defining most functions needed by UserComparatorWrapper to an interface and implement the interface.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10837

Test Plan: Make sure existing tests pass

Reviewed By: pdillinger

Differential Revision: D40528511

fbshipit-source-id: 70eaac89ecd55401a26e8ed32abbc413a9617c62

5d17297b

Refactor block cache tracing APIs (#10811) · 0e7b27bf

由 akankshamahajan 提交于 10月 21, 2022

Summary:
Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Currently, only a TraceWriter is supported, with a default built-in implementation of FileTraceWriter. The TraceWriter, however, takes a flat trace record and is thus only suitable for file tracing. This PR introduces an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter.

`DB::StartBlockTrace` will internally redirect to changed `BlockCacheTrace::StartBlockCacheTrace`.
New API `DB::StartBlockTrace` is also added that directly takes `BlockCacheTraceWriter` pointer.

This same philosophy can be applied to KV and IO tracing as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10811

Test Plan:
existing unit tests
Old API DB::StartBlockTrace checked with db_bench tool
create database
```
./db_bench --benchmarks="fillseq" \
--key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \
--cache_index_and_filter_blocks --cache_size=1048576 \
--disable_auto_compactions=1 --disable_wal=1 --compression_type=none \
--min_level_to_compress=-1 --compression_ratio=1 --num=10000000
```

To trace block cache accesses when running readrandom benchmark:
```
./db_bench --benchmarks="readrandom" --use_existing_db --duration=60 \
--key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \
--cache_index_and_filter_blocks --cache_size=1048576 \
--disable_auto_compactions=1 --disable_wal=1 --compression_type=none \
--min_level_to_compress=-1 --compression_ratio=1 --num=10000000 \
--threads=16 \
-block_cache_trace_file="/tmp/binary_trace_test_example" \
-block_cache_trace_max_trace_file_size_in_bytes=1073741824 \
-block_cache_trace_sampling_frequency=1

```

Reviewed By: anand1976

Differential Revision: D40435289

Pulled By: akankshamahajan15

fbshipit-source-id: fa2755f4788185e19f4605e731641cfd21ab3282

0e7b27bf

Fix HyperClockCache Rollback bug in #10801 (#10843) · b6e33dbc

由 Peter Dillinger 提交于 10月 21, 2022

Summary:
In https://github.com/facebook/rocksdb/issues/10801 in ClockHandleTable::Evict, we saved a reference to the hash value (`const UniqueId64x2& hashed_key`) instead of saving the hash value itself before marking the handle as empty and thus free for use by other threads. This could lead to Rollback seeing the wrong hash value for updating the `displacements` after an entry is removed.

The fix is (like other places) to copy the hash value before it's released. (We could Rollback while we own the entry, but that creates more dependences between atomic updates, because in that case, based on the code, the Rollback writes would have to happen before or after the entry is released by marking empty. By doing the relaxed Rollback after marking empty, there's more opportunity for re-ordering / ILP.)

Intended follow-up: refactoring for better code sharing in clock_cache.cc

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10843

Test Plan: watch for clean crash test, TSAN

Reviewed By: siying

Differential Revision: D40579680

Pulled By: pdillinger

fbshipit-source-id: 258e43b3b80bc980a161d5c675ccc6708ecb8025

b6e33dbc

Ignore max_compaction_bytes for compaction input that are within output key-range (#10835) · 333abe9c

由 Changyu Bi 提交于 10月 21, 2022

Summary:
When picking compaction input files, we sometimes stop picking a file that is fully included in the output key-range due to hitting max_compaction_bytes. Including these input files can potentially reduce WA at the expense of larger compactions. Larger compaction should be fine as files from input level are usually 10X smaller than files from output level. This PR adds a mutable CF option `ignore_max_compaction_bytes_for_input` that is enabled by default. We can remove this option once we are sure it is safe.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10835

Test Plan:
- CI, a unit test on max_compaction_bytes fails before turning this flag off.
- Benchmark does not show much difference in WA: `./db_bench --benchmarks=fillrandom,waitforcompaction,stats,levelstats -max_background_jobs=12 -num=2000000000 -target_file_size_base=33554432 --write_buffer_size=33554432`
```
main:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      3/0   91.59 MB   0.8     70.9     0.0     70.9     200.8    129.9       0.0   1.5     25.2     71.2   2886.55           2463.45      9725    0.297   1093M   254K       0.0       0.0
  L1      9/0   248.03 MB   1.0    392.0   129.8    262.2     391.7    129.5       0.0   3.0     69.0     68.9   5821.71           5536.90       804    7.241   6029M  5814K       0.0       0.0
  L2     87/0    2.50 GB   1.0    537.0   128.5    408.5     533.8    125.2       0.7   4.2     69.5     69.1   7912.24           7323.70      4417    1.791   8299M    36M       0.0       0.0
  L3    836/0   24.99 GB   1.0    616.9   118.3    498.7     594.5     95.8       5.2   5.0     66.9     64.5   9442.38           8490.28      4204    2.246   9749M   306M       0.0       0.0
  L4   2355/0   62.95 GB   0.3     67.3    37.1     30.2      54.2     24.0      38.9   1.5     72.2     58.2    954.37            821.18       917    1.041   1076M   173M       0.0       0.0
 Sum   3290/0   90.77 GB   0.0   1684.2   413.7   1270.5    1775.0    504.5      44.9  13.7     63.8     67.3  27017.25          24635.52     20067    1.346     26G   522M       0.0       0.0

Cumulative compaction: 1774.96 GB write, 154.29 MB/s write, 1684.19 GB read, 146.40 MB/s read, 27017.3 seconds

This PR:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      3/0   45.71 MB   0.8     72.9     0.0     72.9     202.8    129.9       0.0   1.6     25.4     70.7   2938.16           2510.36      9741    0.302   1124M   265K       0.0       0.0
  L1      8/0   234.54 MB   0.9    384.5   129.8    254.7     384.2    129.6       0.0   3.0     69.0     68.9   5708.08           5424.43       791    7.216   5913M  5753K       0.0       0.0
  L2     84/0    2.47 GB   1.0    543.1   128.6    414.5     539.9    125.4       0.7   4.2     69.6     69.2   7989.31           7403.13      4418    1.808   8393M    36M       0.0       0.0
  L3    839/0   24.96 GB   1.0    615.6   118.4    497.2     593.2     96.0       5.1   5.0     66.6     64.1   9471.23           8489.31      4193    2.259   9726M   306M       0.0       0.0
  L4   2360/0   63.04 GB   0.3     67.6    37.3     30.3      54.4     24.1      38.9   1.5     71.5     57.6    967.30            827.99       907    1.066   1080M   173M       0.0       0.0
 Sum   3294/0   90.75 GB   0.0   1683.8   414.2   1269.6    1774.5    504.9      44.8  13.7     63.7     67.1  27074.08          24655.22     20050    1.350     26G   522M       0.0       0.0

Cumulative compaction: 1774.52 GB write, 157.09 MB/s write, 1683.77 GB read, 149.06 MB/s read, 27074.1 seconds
```

Reviewed By: ajkr

Differential Revision: D40518319

Pulled By: cbi42

fbshipit-source-id: f4ea614bc0ebefe007ffaf05bb9aec9a8ca25b60

333abe9c

Separate the handling of value types in SaveValue (#10840) · 8dd4bf6c

由 Levi Tamasi 提交于 10月 21, 2022

Summary:
Currently, the code in `SaveValue` that handles `kTypeValue` and
`kTypeBlobIndex` (and more recently, `kTypeWideColumnEntity`) is
mostly shared. This made sense originally; however, by now the
handling of these three value types has diverged significantly. The
patch makes the logic cleaner and also eliminates quite a bit of branching
by giving each value type its own `case` and removing a fall-through.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10840

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D40568420

Pulled By: ltamasi

fbshipit-source-id: 2e614606afd1c3d9c76d9b5f1efa0959fc174103

8dd4bf6c

21 10月, 2022 4 次提交

Bump nokogiri from 1.13.6 to 1.13.9 in /docs (#10842) · 2564215e

由 dependabot[bot] 提交于 10月 20, 2022

Summary:
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.6 to 1.13.9.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p>
<blockquote>
<h2>1.13.9 / 2022-10-18</h2>
<h3>Security</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-2309">CVE-2022-2309</a>, <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-40304">CVE-2022-40304</a>, and <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-40303">CVE-2022-40303</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-2qc6-mcvw-92cw">GHSA-2qc6-mcvw-92cw</a> for more information.</li>
<li>[CRuby] Vendored zlib is updated to address <a href="https://ubuntu.com/security/CVE-2022-37434">CVE-2022-37434</a>. Nokogiri was not affected by this vulnerability, but this version of zlib was being flagged up by some vulnerability scanners, see <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2626">https://github.com/facebook/rocksdb/issues/2626</a> for more information.</li>
</ul>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated to <a href="https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.10.3">v2.10.3</a> from v2.9.14.</li>
<li>[CRuby] Vendored libxslt is updated to <a href="https://gitlab.gnome.org/GNOME/libxslt/-/releases/v1.1.37">v1.1.37</a> from v1.1.35.</li>
<li>[CRuby] Vendored zlib is updated from 1.2.12 to 1.2.13. (See <a href="https://github.com/sparklemotion/nokogiri/blob/v1.13.x/LICENSE-DEPENDENCIES.md#platform-releases">LICENSE-DEPENDENCIES.md</a> for details on which packages redistribute this library.)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>[CRuby] <code>Nokogiri::XML::Namespace</code> objects, when compacted, update their internal struct's reference to the Ruby object wrapper. Previously, with GC compaction enabled, a segmentation fault was possible after compaction was triggered. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2658">https://github.com/facebook/rocksdb/issues/2658</a>] (Thanks, <a href="https://github.com/eightbitraptor"><code>@eightbitraptor</code></a> and <a href="https://github.com/peterzhu2118"><code>@peterzhu2118</code></a>!)</li>
<li>[CRuby] <code>Document#remove_namespaces!</code> now defers freeing the underlying <code>xmlNs</code> struct until the <code>Document</code> is GCed. Previously, maintaining a reference to a <code>Namespace</code> object that was removed in this way could lead to a segfault. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2658">https://github.com/facebook/rocksdb/issues/2658</a>]</li>
</ul>
<hr />
<p>sha256 checksums:</p>
<pre><code>9b69829561d30c4461ea803baeaf3460e8b145cff7a26ce397119577a4083a02  nokogiri-1.13.9-aarch64-linux.gem
e76ebb4b7b2e02c72b2d1541289f8b0679fb5984867cf199d89b8ef485764956  nokogiri-1.13.9-arm64-darwin.gem
15bae7d08bddeaa898d8e3f558723300137c26a2dc2632a1f89c8574c4467165  nokogiri-1.13.9-java.gem
f6a1dbc7229184357f3129503530af73cc59ceba4932c700a458a561edbe04b9  nokogiri-1.13.9-x64-mingw-ucrt.gem
36d935d799baa4dc488024f71881ff0bc8b172cecdfc54781169c40ec02cbdb3  nokogiri-1.13.9-x64-mingw32.gem
ebaf82aa9a11b8fafb67873d19ee48efb565040f04c898cdce8ca0cd53ff1a12  nokogiri-1.13.9-x86-linux.gem
11789a2a11b28bc028ee111f23311461104d8c4468d5b901ab7536b282504154  nokogiri-1.13.9-x86-mingw32.gem
01830e1646803ff91c0fe94bc768ff40082c6de8cfa563dafd01b3f7d5f9d795  nokogiri-1.13.9-x86_64-darwin.gem
8e93b8adec22958013799c8690d81c2cdf8a90b6f6e8150ab22e11895844d781  nokogiri-1.13.9-x86_64-linux.gem
96f37c1baf0234d3ae54c2c89aef7220d4a8a1b03d2675ff7723565b0a095531  nokogiri-1.13.9.gem
</code></pre>
<h2>1.13.8 / 2022-07-23</h2>
<h3>Deprecated</h3>
<ul>
<li><code>XML::Reader#attribute_nodes</code> is deprecated due to incompatibility between libxml2's <code>xmlReader</code> memory semantics and Ruby's garbage collector. Although this method continues to exist for backwards compatibility, it is unsafe to call and may segfault. This method will be removed in a future version of Nokogiri, and callers should use <code>#attribute_hash</code> instead. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2598">https://github.com/facebook/rocksdb/issues/2598</a>]</li>
</ul>
<h3>Improvements</h3>
<ul>
<li><code>XML::Reader#attribute_hash</code> is a new method to safely retrieve the attributes of a node from <code>XML::Reader</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2598">https://github.com/facebook/rocksdb/issues/2598</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2599">https://github.com/facebook/rocksdb/issues/2599</a>]</li>
</ul>
<h3>Fixed</h3>

</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md">nokogiri's changelog</a>.</em></p>
<blockquote>
<h2>1.13.9 / 2022-10-18</h2>
<h3>Security</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-2309">CVE-2022-2309</a>, <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-40304">CVE-2022-40304</a>, and <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-40303">CVE-2022-40303</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-2qc6-mcvw-92cw">GHSA-2qc6-mcvw-92cw</a> for more information.</li>
<li>[CRuby] Vendored zlib is updated to address <a href="https://ubuntu.com/security/CVE-2022-37434">CVE-2022-37434</a>. Nokogiri was not affected by this vulnerability, but this version of zlib was being flagged up by some vulnerability scanners, see <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2626">https://github.com/facebook/rocksdb/issues/2626</a> for more information.</li>
</ul>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated to <a href="https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.10.3">v2.10.3</a> from v2.9.14.</li>
<li>[CRuby] Vendored libxslt is updated to <a href="https://gitlab.gnome.org/GNOME/libxslt/-/releases/v1.1.37">v1.1.37</a> from v1.1.35.</li>
<li>[CRuby] Vendored zlib is updated from 1.2.12 to 1.2.13. (See <a href="https://github.com/sparklemotion/nokogiri/blob/v1.13.x/LICENSE-DEPENDENCIES.md#platform-releases">LICENSE-DEPENDENCIES.md</a> for details on which packages redistribute this library.)</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>[CRuby] <code>Nokogiri::XML::Namespace</code> objects, when compacted, update their internal struct's reference to the Ruby object wrapper. Previously, with GC compaction enabled, a segmentation fault was possible after compaction was triggered. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2658">https://github.com/facebook/rocksdb/issues/2658</a>] (Thanks, <a href="https://github.com/eightbitraptor"><code>@eightbitraptor</code></a> and <a href="https://github.com/peterzhu2118"><code>@peterzhu2118</code></a>!)</li>
<li>[CRuby] <code>Document#remove_namespaces!</code> now defers freeing the underlying <code>xmlNs</code> struct until the <code>Document</code> is GCed. Previously, maintaining a reference to a <code>Namespace</code> object that was removed in this way could lead to a segfault. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2658">https://github.com/facebook/rocksdb/issues/2658</a>]</li>
</ul>
<h2>1.13.8 / 2022-07-23</h2>
<h3>Deprecated</h3>
<ul>
<li><code>XML::Reader#attribute_nodes</code> is deprecated due to incompatibility between libxml2's <code>xmlReader</code> memory semantics and Ruby's garbage collector. Although this method continues to exist for backwards compatibility, it is unsafe to call and may segfault. This method will be removed in a future version of Nokogiri, and callers should use <code>#attribute_hash</code> instead. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2598">https://github.com/facebook/rocksdb/issues/2598</a>]</li>
</ul>
<h3>Improvements</h3>
<ul>
<li><code>XML::Reader#attribute_hash</code> is a new method to safely retrieve the attributes of a node from <code>XML::Reader</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2598">https://github.com/facebook/rocksdb/issues/2598</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2599">https://github.com/facebook/rocksdb/issues/2599</a>]</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>[CRuby] Calling <code>XML::Reader#attributes</code> is now safe to call. In Nokogiri &lt;= 1.13.7 this method may segfault. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2598">https://github.com/facebook/rocksdb/issues/2598</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2599">https://github.com/facebook/rocksdb/issues/2599</a>]</li>
</ul>
<h2>1.13.7 / 2022-07-12</h2>
<h3>Fixed</h3>
<p><code>XML::Node</code> objects, when compacted, update their internal struct's reference to the Ruby object wrapper. Previously, with GC compaction enabled, a segmentation fault was possible after compaction was triggered. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2578">https://github.com/facebook/rocksdb/issues/2578</a>] (Thanks, <a href="https://github.com/eightbitraptor"><code>@eightbitraptor</code></a>!)</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/897759cc25b57ebf2754897e910c86931dec7d39"><code>897759c</code></a> version bump to v1.13.9</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/aeb1ac32830a34369a46625613f21ee17e3e445e"><code>aeb1ac3</code></a> doc: update CHANGELOG</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/c663e4905a35edd23f7cc05a80126b4e446e4fd2"><code>c663e49</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2671">https://github.com/facebook/rocksdb/issues/2671</a> from sparklemotion/flavorjones-update-zlib-1.2.13_v1...</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/212e07da28096db7d2cbda697bc2a38d71f6dc3a"><code>212e07d</code></a> ext: hack to cross-compile zlib v1.2.13 on darwin</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/76dbc8c5bef99467f3403297e29da4297fbddeb7"><code>76dbc8c</code></a> dep: update zlib to v1.2.13</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/24e3a9c41428195c66745fef8ce697101167bd08"><code>24e3a9c</code></a> doc: update CHANGELOG</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/4db3b4daa9ca8d1c1996cc9741c76ba2b8d1673b"><code>4db3b4d</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2668">https://github.com/facebook/rocksdb/issues/2668</a> from sparklemotion/flavorjones-namespace-scopes-comp...</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/73d73d6e433f17f39e188f5c03ec176b60719416"><code>73d73d6</code></a> fix: Document#remove_namespaces! use-after-free bug</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/5f58b34724a6e48c7c478cfda5fc9c4cac581e08"><code>5f58b34</code></a> fix: namespace nodes behave properly when compacted</li>
<li><a href="https://github.com/sparklemotion/nokogiri/commit/b08a8586c7c34831be0f13f9147b84016d17d94b"><code>b08a858</code></a> test: repro namespace_scopes compaction issue</li>
<li>Additional commits viewable in <a href="https://github.com/sparklemotion/nokogiri/compare/v1.13.6...v1.13.9">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nokogiri&package-manager=bundler&previous-version=1.13.6&new-version=1.13.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

 ---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts).

</details>

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10842

Reviewed By: siying

Differential Revision: D40579643

Pulled By: ajkr

fbshipit-source-id: 45035f691035cdbb111dc0b36489c4e91fe31cae

2564215e

Fix no internal time recorded for small preclude_last_level (#10829) · 1663f77d

由 Jay Zhuang 提交于 10月 20, 2022

Summary:
When the `preclude_last_level_data_seconds` or
`preserve_internal_time_seconds` is smaller than 100 (seconds), no seqno->time information was recorded.
Also make sure all data will be compacted to the last level even if there's no write to record the time information.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10829

Test Plan: added unittest

Reviewed By: siying

Differential Revision: D40443934

Pulled By: jay-zhuang

fbshipit-source-id: 2ecf1361daf9f3e5c3385aee6dc924fa59e2813a

1663f77d

Support providing the default column separately when serializing columns (#10839) · 865d5576

由 Levi Tamasi 提交于 10月 20, 2022

Summary:
The patch makes it possible to provide the value of the default column
separately when calling `WideColumnSerialization::Serialize`. This eliminates
the need to construct a new `WideColumns` vector in certain cases
(for example, it will come in handy when implementing `Merge`).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10839

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D40561448

Pulled By: ltamasi

fbshipit-source-id: 69becdd510e6a83ab1feb956c12772110e1040d6

865d5576

Add DB property for fast block cache stats collection (#10832) · 33ceea9b

由 Andrew Kryczka 提交于 10月 20, 2022

Summary:
This new property allows users to trigger the background block cache stats collection mode through the `GetProperty()` and `GetMapProperty()` APIs. The background mode has much lower overhead at the expense of returning stale values in more cases.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10832

Test Plan: updated unit test

Reviewed By: pdillinger

Differential Revision: D40497883

Pulled By: ajkr

fbshipit-source-id: bdcc93402f426463abb2153756aad9e295447343

33ceea9b

19 10月, 2022 2 次提交

Refactor ShardedCache for more sharing, static polymorphism (#10801) · 7555243b

由 Peter Dillinger 提交于 10月 18, 2022

Summary:
The motivations for this change include
* Free up space in ClockHandle so that we can add data for secondary cache handling while still keeping within single cache line (64 byte) size.
  * This change frees up space by eliminating the need for the `hash` field by making the fixed-size key itself a hash, using a 128-bit bijective (lossless) hash.
* Generally more customizability of ShardedCache (such as hashing) without worrying about virtual call overheads
  * ShardedCache now uses static polymorphism (template) instead of dynamic polymorphism (virtual overrides) for the CacheShard. No obvious performance benefit is seen from the change (as mostly expected; most calls to virtual functions in CacheShard could already be optimized to static calls), but offers more flexibility without incurring the runtime cost of adhering to a common interface (without type parameters or static callbacks).
  * You'll also notice less `reinterpret_cast`ing and other boilerplate in the Cache implementations, as this can go in ShardedCache.

More detail:
* Don't have LRUCacheShard maintain `std::shared_ptr<SecondaryCache>` copies (extra refcount) when LRUCache can be in charge of keeping a `shared_ptr`.
* Renamed `capacity_mutex_` to `config_mutex_` to better represent the scope of what it guards.
* Some preparation for 64-bit hash and indexing in LRUCache, but didn't include the full change because of slight performance regression.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10801

Test Plan:
Unit test updates were non-trivial because of major changes to the ClockCacheShard interface in handling of key vs. hash.

Performance:
Create with `TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16`

Test with
```
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X1000] -readonly -num=30000000 -bloom_bits=16 -cache_index_and_filter_blocks=1 -cache_size=610000000 -duration 20 -threads=16
```

Before: `readrandom [AVG 150 runs] : 321147 (± 253) ops/sec`
After: `readrandom [AVG 150 runs] : 321530 (± 326) ops/sec`

So possibly ~0.1% improvement.

And with `-cache_type=hyper_clock_cache`:
Before: `readrandom [AVG 30 runs] : 614126 (± 7978) ops/sec`
After: `readrandom [AVG 30 runs] : 645349 (± 8087) ops/sec`

So roughly 5% improvement!

Reviewed By: anand1976

Differential Revision: D40252236

Pulled By: pdillinger

fbshipit-source-id: ff8fc70ef569585edc95bcbaaa0386f61355ae5b

7555243b

Enable a multi-level db to smoothly migrate to FIFO via DB::Open (#10348) · e267909e

由 Yueh-Hsuan Chiang 提交于 10月 18, 2022

Summary:
FIFO compaction can theoretically open a DB with any compaction style.
However, the current code only allows FIFO compaction to open a DB with
a single level.

This PR relaxes the limitation of FIFO compaction and allows it to open a
DB with multiple levels.  Below is the read / write / compaction behavior:

* The read behavior is untouched, and it works like a regular rocksdb instance.
* The write behavior is untouched as well.  When a FIFO compacted DB
is opened with multiple levels, all new files will still be in level 0, and no files
will be moved to a different level.
* Compaction logic is extended.  It will first identify the bottom-most non-empty level.
Then, it will delete the oldest file in that level.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10348

Test Plan:
Added a new test to verify the migration from level to FIFO where the db has multiple levels.
Extended existing test cases in db_test and db_basic_test to also verify
all entries of a key after reopening the DB with FIFO compaction.

Reviewed By: jay-zhuang

Differential Revision: D40233744

fbshipit-source-id: 6cc011d6c3467e6bfb9b6a4054b87619e69815e1

e267909e

18 10月, 2022 3 次提交

Print stack traces on frozen tests in CI (#10828) · e466173d

由 Peter Dillinger 提交于 10月 18, 2022

Summary:
Instead of existing calls to ps from gnu_parallel, call a new wrapper that does ps, looks for unit test like processes, and uses pstack or gdb to print thread stack traces. Also, using `ps -wwf` instead of `ps -wf` ensures output is not cut off.

For security, CircleCI runs with security restrictions on ptrace (/proc/sys/kernel/yama/ptrace_scope = 1), and this change adds a work-around to `InstallStackTraceHandler()` (only used by testing tools) to allow any process from the same user to debug it. (I've also touched >100 files to ensure all the unit tests call this function.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10828

Test Plan: local manual + temporary infinite loop in a unit test to observe in CircleCI

Reviewed By: hx235

Differential Revision: D40447634

Pulled By: pdillinger

fbshipit-source-id: 718a4c4a5b54fa0f9af2d01a446162b45e5e84e1

e466173d

Improve / refactor anonymous mmap capabilities (#10810) · 8367f0d2

由 Peter Dillinger 提交于 10月 17, 2022

Summary:
The motivation for this change is a planned feature (related to HyperClockCache) that will depend on a large array that can essentially grow automatically, up to some bound, without the pointer address changing and with guaranteed zero-initialization of the data. Anonymous mmaps provide such functionality, and this change provides an internal API for that.

The other existing use of anonymous mmap in RocksDB is for allocating in huge pages. That code and other related Arena code used some awkward non-RAII and pre-C++11 idioms, so I cleaned up much of that as well, with RAII, move semantics, constexpr, etc.

More specifcs:
* Minimize conditional compilation
* Add Windows support for anonymous mmaps
* Use std::deque instead of std::vector for more efficient bag

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10810

Test Plan: unit test added for new functionality

Reviewed By: riversand963

Differential Revision: D40347204

Pulled By: pdillinger

fbshipit-source-id: ca83fcc47e50fabf7595069380edd2954f4f879c

8367f0d2

Do not adjust test_batches_snapshots to avoid mixing runs (#10830) · 11c0d131

由 Levi Tamasi 提交于 10月 17, 2022

Summary:
This is a small follow-up to https://github.com/facebook/rocksdb/pull/10821. The goal of that PR was to hold `test_batches_snapshots` fixed across all `db_stress` invocations; however, that patch didn't address the case when `test_batches_snapshots` is unset due to a conflicting `enable_compaction_filter` or `prefix_size` setting. This PR updates the logic so the other parameter is sanitized instead in the case of such conflicts.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10830

Reviewed By: riversand963

Differential Revision: D40444548

Pulled By: ltamasi

fbshipit-source-id: 0331265704904b729262adec37139292fcbb7805

11c0d131

17 10月, 2022 2 次提交

Git ignore .clangd/ (#10817) · 8142223b

由 Peter Dillinger 提交于 10月 17, 2022

Summary:
Used for IDE integration

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10817

Test Plan: CI

Reviewed By: riversand963

Differential Revision: D40348563

Pulled By: pdillinger

fbshipit-source-id: ae2151017de7df6afc55363276105a7dac53683c

8142223b

Enable preclude_last_level_data_seconds in stress test (#10824) · 8124bc35

由 Jay Zhuang 提交于 10月 16, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10824

Reviewed By: siying

Differential Revision: D40390535

Pulled By: jay-zhuang

fbshipit-source-id: 700803a1aff8a1e77c038740d87931577e79bcf6

8124bc35

15 10月, 2022 1 次提交

Check wide columns in TestIterateAgainstExpected (#10820) · 2f3042d7

由 Levi Tamasi 提交于 10月 14, 2022

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10820

Reviewed By: riversand963

Differential Revision: D40363653

Pulled By: ltamasi

fbshipit-source-id: d347547d8cdd3f8926b35b6af4d1fa0f827e4a10

2f3042d7

kvdb / rocksdb 11 个月 前同步成功

kvdb / rocksdb
11 个月前同步成功