- 02 9月, 2023 1 次提交
-
-
由 Changyu Bi 提交于
Summary: This happens in (Compaction)MergingIterator layer, and can cause data loss during compaction or read/scan return incorrect result Pull Request resolved: https://github.com/facebook/rocksdb/pull/11782 Reviewed By: ajkr Differential Revision: D48880575 Pulled By: cbi42 fbshipit-source-id: 2294ad284a6d653d3674bebe55380f12ee4b645b
-
- 01 9月, 2023 1 次提交
-
-
由 Jay Huh 提交于
Summary: As mentioned in https://github.com/facebook/rocksdb/issues/11754 , refactor to clean up some nearly identical logic. This PR changes the existing debugging string format of Scan command as the following. ``` ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan --hex ``` Before ``` 0x6669727374 : :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172 0x7365636F6E64 : 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572 0x7468697264 : 0x62617A ``` After ``` 0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172 0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572 0x7468697264 ==> 0x62617A ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11777 Test Plan: ``` ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ dump first ==> :hello attr_name1:foo attr_name2:bar second ==> attr_one:two attr_three:four third ==> baz Keys in range: 3 ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan first ==> :hello attr_name1:foo attr_name2:bar second ==> attr_one:two attr_three:four third ==> baz ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ dump --hex 0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172 0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572 0x7468697264 ==> 0x62617A Keys in range: 3 ❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan --hex 0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172 0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572 0x7468697264 ==> 0x62617A ``` Reviewed By: jowlyzhang Differential Revision: D48876755 Pulled By: jaykorean fbshipit-source-id: b1c608a810fe038999ac528b690d398abf5f21d7
-
- 31 8月, 2023 6 次提交
-
-
由 Peter Dillinger 提交于
Summary: ... in info_log. Becoming more important with disaggregated storage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11776 Test Plan: manual Reviewed By: jaykorean Differential Revision: D48849471 Pulled By: pdillinger fbshipit-source-id: 9a8fd8b2564a4f133526ecd7c1414cb667e4ba54
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** After https://github.com/facebook/rocksdb/pull/11631, we rely on `compaction_readahead_size` for how much to read ahead for compaction read under non-direct IO case. https://github.com/facebook/rocksdb/pull/11658 therefore also sanitized 0 `compaction_readahead_size` to 2MB under non-direct IO, which is consistent with the existing sanitization with direct IO. However, this makes disabling compaction readahead impossible as well as add one more scenario to the inconsistent effects between `Options.compaction_readahead_size=0` during DB open and `SetDBOptions("compaction_readahead_size", "0")` . - `SetDBOptions("compaction_readahead_size", "0")` will disable compaction readahead as its logic never goes through sanitization above while `Options.compaction_readahead_size=0` will go through sanitization. Therefore we decided to do this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11762 Test Plan: Modified existing UTs to cover this PR Reviewed By: ajkr Differential Revision: D48759560 Pulled By: hx235 fbshipit-source-id: b3f85e58bda362a6fa1dc26bd8a87aa0e171af79
-
由 Yu Zhang 提交于
Summary: For a SST file that uses user-defined timestamp aware comparators, if a lower or upper bound is set, sst_dump tool doesn't handle it well. This PR adds support for that. While working on this `MaybeAddTimestampsToRange` is moved to the udt_util.h file to be shared. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11757 Test Plan: make all check for changes in db_impl.cc and db_impl_compaction_flush.cc for changes in sst_file_dumper.cc, I manually tested this change handles specifying bounds for UDT use cases. It probably should have a unit test file eventually. Reviewed By: ltamasi Differential Revision: D48668048 Pulled By: jowlyzhang fbshipit-source-id: 1560465f40e44668d6d82a7439fe9012be0e74a8
-
由 Jay Huh 提交于
Summary: wide_columns can now be pretty-printed in the following commands - `./ldb dump_wal` - `./ldb dump` - `./ldb idump` - `./ldb dump_live_files` - `./ldb scan` - `./sst_dump --command=scan` There are opportunities to refactor to reduce some nearly identical code. This PR is initial change to add wide column support in `ldb` and `sst_dump` tool. More PRs to come for the refactor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11754 Test Plan: **New Tests added** - `WideColumnsHelperTest::DumpWideColumns` - `WideColumnsHelperTest::DumpSliceAsWideColumns` **Changes added to existing tests** - `ExternalSSTFileTest::BasicMixed` added to cover mixed case (This test should have been added in https://github.com/facebook/rocksdb/issues/11688). This test does not verify the ldb or sst_dump output. This test was used to create test SST files having some rows with wide columns and some without and the generated SST files were used to manually test sst_dump_tool. - `createSST()` in `sst_dump_test` now takes `wide_column_one_in` to add wide column value in SST **dump_wal** ``` ./ldb dump_wal --walfile=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/000004.log --print_value --header ``` ``` Sequence,Count,ByteSize,Physical Offset,Key(s) : value 1,1,59,0,PUT_ENTITY(0) : 0x:0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172 2,1,34,42,PUT_ENTITY(0) : 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572 3,1,17,7d,PUT(0) : 0x7468697264 : 0x62617A ``` **idump** ``` ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ idump ``` ``` 'first' seq:1, type:22 => :hello attr_name1:foo attr_name2:bar 'second' seq:2, type:22 => attr_one:two attr_three:four 'third' seq:3, type:1 => baz Internal keys in range: 3 ``` **SST Dump from dump_live_files** ``` ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ compact ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ dump_live_files ``` ``` ... ============================== SST Files ============================== /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst level:1 ------------------------------ Process /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst Sst file format: block-based 'first' seq:0, type:22 => :hello attr_name1:foo attr_name2:bar 'second' seq:0, type:22 => attr_one:two attr_three:four 'third' seq:0, type:1 => baz ... ``` **dump** ``` ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ dump ``` ``` first ==> :hello attr_name1:foo attr_name2:bar second ==> attr_one:two attr_three:four third ==> baz Keys in range: 3 ``` **scan** ``` ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ scan ``` ``` first : :hello attr_name1:foo attr_name2:bar second : attr_one:two attr_three:four third : baz ``` **sst_dump** ``` ./sst_dump --file=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst --command=scan ``` ``` options.env is 0x7ff54b296000 Process /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst Sst file format: block-based from [] to [] 'first' seq:0, type:22 => :hello attr_name1:foo attr_name2:bar 'second' seq:0, type:22 => attr_one:two attr_three:four 'third' seq:0, type:1 => baz ``` Reviewed By: ltamasi Differential Revision: D48837999 Pulled By: jaykorean fbshipit-source-id: b0280f0589d2b9716bb9b50530ffcabb397d140f
-
由 Hui Xiao 提交于
Summary: …n change (https://github.com/facebook/rocksdb/issues/11755)" This reverts commit 45131659. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11773 Reviewed By: ajkr Differential Revision: D48832320 Pulled By: hx235 fbshipit-source-id: 96cef26a885134360766a83505f6717598eac6a9
-
由 Yu Zhang 提交于
Summary: This PR adds a missing piece for the UDT in memtable only feature, which is to automatically increase `full_history_ts_low` when flush happens during recovery. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11774 Test Plan: Added unit test make all check Reviewed By: ltamasi Differential Revision: D48799109 Pulled By: jowlyzhang fbshipit-source-id: fd681ed66d9d40904ca2c919b2618eb692686035
-
- 30 8月, 2023 4 次提交
-
-
由 jsteemann 提交于
Summary: the value of `done` is always false here, so the sub-condition `!done` will always be true and the check can be removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11746 Reviewed By: anand1976 Differential Revision: D48656845 Pulled By: ajkr fbshipit-source-id: 523ba3d07b3af7880c8c8ccb20442fd7c0f49417
-
由 Andrew Kryczka 提交于
Summary: Having a synthetic implementation of `TimedWait()` in `SystemClock` will allow us to add `SyncPoint`s while mutex is released, which was previously impossible since the lock was released and reacquired all within `pthread_cond_timedwait()`. Additionally, integrating `TimedWait()` with `MockSystemClock` allows us to cleanup some workarounds in the test code. In this PR I only cleaned up the `GenericRateLimiter` test workaround. This is related to the intended follow-up mentioned in https://github.com/facebook/rocksdb/issues/7101's description. There are a couple differences: (1) This PR does not include removing the particular workaround that initially motivated it. Actually, the `Timer` class uses `InstrumentedCondVar`, so the interface introduced here is inadequate to remove that workaround. On the bright side, the interface introduced in this PR can be changed as needed since it can neither be used nor extended externally, due to using forward-declared `port::CondVar*` in the interface. (2) This PR only makes the change in `SystemClock` not `Env`. Older revisions of this PR included `Env::TimedWait()` and `SpecialEnv::TimedWait()`; however, since they were unused it probably makes sense to defer adding them until when they are needed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11753 Reviewed By: pdillinger Differential Revision: D48654995 Pulled By: ajkr fbshipit-source-id: 15e19f2454b64d4ec7f50e328691c66ca9911122
-
由 jsteemann 提交于
Summary: when a key is recorded for locking in a pessimistic transaction, the key is first looked up in a map, and then inserted into the map if it was not already contained. this can be simplified to an unconditional insert. in the ideal case that all keys are unique, this saves all the find() operations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11743 Reviewed By: anand1976 Differential Revision: D48656798 Pulled By: ajkr fbshipit-source-id: d0150de2db757e0c05e1797cfc24380790c71276
-
由 Yu Zhang 提交于
Summary: The user-defined timestamps feature only enforces that for the same key, user-defined timestamps should be non-decreasing. For the user-defined timestamps in memtable only feature, during flush, we check the user-defined timestamps in each memtable to examine if the data is considered expired with regard to `full_history_ts_low`. In this process, it's assuming that a newer memtable should not have smaller user-defined timestamps than an older memtable. This check however is enforcing ordering of user-defined timestamps across keys, as apposed to the vanilla UDT feature, that only enforce ordering of user-defined timestamps for the same key. This more strict user-defined timestamp ordering requirement could be an issue for secondary instances where commits can be out of order. And after thinking more about it, this requirement is really an overkill to keep the invariants of `full_history_ts_low` which are: 1) users cannot read below `full_history_ts_low` 2) users cannot write at or below `full_history_ts_low` 3) `full_history_ts_low` can only be increasing As long as RocksDB enforces these 3 checks, we can prohibit inconsistent read that returns a different value. And these three checks are covered in existing APIs. So this PR removes the extra checks in the UDT in memtable only feature that requires user-defined timestamps to be non decreasing across keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11732 Reviewed By: ltamasi Differential Revision: D48541466 Pulled By: jowlyzhang fbshipit-source-id: 95453c6e391cbd511c0feab05f0b11c312d17186
-
- 29 8月, 2023 2 次提交
-
-
由 akankshamahajan 提交于
Summary: Fix seg fault in auto_readahead_size with async_io when readahead_size = 0. If readahead_size is trimmed and is 0, it's not eligible for further prefetching and should return. Error occured when the first buffer already contains data and it goes for prefetching in second buffer leading to assertion failure - `assert(roundup_len1 >= alignment); ` because roundup_len1 = length + readahead_size. length is 0 and readahead_size is also 0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11769 Test Plan: Reproducible with db_stress with async_io enabled. Reviewed By: anand1976 Differential Revision: D48743031 Pulled By: akankshamahajan15 fbshipit-source-id: 0e08c41f862f6287ca223fbfaf6cd42fc97b3c87
-
由 Andrew Kryczka 提交于
Summary: Fixes https://github.com/facebook/rocksdb/issues/11742 Even after performing duty (1) ("Waiting for the next refill time"), it is possible the remaining threads are all in `Wait()`. Waking up at least one thread is enough to ensure progress continues, even if no new requests arrive. The repro unit test (https://github.com/facebook/rocksdb/commit/bb54245e6) is not included as it depends on an unlanded PR (https://github.com/facebook/rocksdb/issues/11753) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11763 Reviewed By: jaykorean Differential Revision: D48710130 Pulled By: ajkr fbshipit-source-id: 9d166bd577ea3a96ccd81dde85871fec5e85a4eb
-
- 26 8月, 2023 3 次提交
-
-
由 Jan 提交于
Summary: `VersionBuilderMap` type alias definition seem unused. If this PR can be compiled fine then the alias is probably not needed anymore. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11286 Reviewed By: jaykorean Differential Revision: D48656747 Pulled By: ajkr fbshipit-source-id: ac8554922aead7dc3d24fe7e6544a4622578c514
-
由 Richard Barnes 提交于
Del `(object)` from 200 inc instagram-server/distillery/slipstream/thrift_models/StoryFeedMediaSticker/ttypes.py Summary: Python3 makes the use of `(object)` in class inheritance unnecessary. Let's modernize our code by eliminating this. Reviewed By: itamaro Differential Revision: D48673915 fbshipit-source-id: a1a6ae8572271eb2898b748c8216ea68e362f06a
-
由 akankshamahajan 提交于
Summary: Fix seg fault in auto_readahead_size ``` db_stress: internal_repo_rocksdb/repo/table/block_based/partitioned_index_iterator.h:70: virtual rocksdb::IndexValue rocksdb::PartitionedIndexIterator::value() const: Assertion `Valid()' failed. ``` During seek, after calculating readahead_size, db_stress can inject IOError resulting in failure to index_iter_->Seek and making index_iter_ invalid. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11761 Test Plan: Reproducible locally and passed with this fix Reviewed By: anand1976 Differential Revision: D48696248 Pulled By: akankshamahajan15 fbshipit-source-id: 2be43bf56ad0fc2f95f9093c19c9a1b15a716091
-
- 25 8月, 2023 3 次提交
-
-
由 Peter Dillinger 提交于
Summary: * Add some options to cache_bench to use JemallocNodumpAllocator * Make num_shard_bits option use and report cache-specific defaults * Add a usleep option to sleep between operations, for simulating a workload with more CPU idle/wait time. * Use const& for JemallocAllocatorOptions, to improve API usability (e.g. can bind to temporary `{}`) * InstallStackTraceHandler() Pull Request resolved: https://github.com/facebook/rocksdb/pull/11758 Test Plan: manual Reviewed By: jowlyzhang Differential Revision: D48668479 Pulled By: pdillinger fbshipit-source-id: b6032fbe09444cdb8f1443a5e017d2eea4f6205a
-
由 Akanksha Mahajan 提交于
Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/11729 Test Plan: make crash_test -j32 Reviewed By: anand1976 Differential Revision: D48534820 Pulled By: akankshamahajan15 fbshipit-source-id: 3a2a28af98dfad164b82ddaaf9fddb94c53a652e
-
由 Hui Xiao 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11755 Reviewed By: anand1976 Differential Revision: D48656627 Pulled By: hx235 fbshipit-source-id: 568fa7749cbf6ecf65102b4513fa3af975fd91b8
-
- 24 8月, 2023 2 次提交
-
-
由 Fuat Basik 提交于
Summary: In blackbox tests, db_stress command always run with timeout. Timeout can happen during validation, leaving some of the keys not checked. Since key validation is done in order, it is quite likely that keys those are towards to the end of the set are never validated. This PR adds a final execution, without timeout, to ensure validation is executed for all keys, at least once. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11592 Reviewed By: cbi42 Differential Revision: D48003998 Pulled By: hx235 fbshipit-source-id: 72543475a932f12cf0f57534b7e3b6e07e87080f
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** Same intention as https://github.com/facebook/rocksdb/pull/2693 - basically we now pick from the last sorted run and expand forward till we can't Pull Request resolved: https://github.com/facebook/rocksdb/pull/11740 Test Plan: Existing UT Stress test Reviewed By: ajkr Differential Revision: D48586475 Pulled By: hx235 fbshipit-source-id: 3eb3c3ee1d5f7e0b0d6d649baaeb8c6990fee398
-
- 23 8月, 2023 3 次提交
-
-
由 Alexander Bulimov 提交于
Summary: Add a bunch of C API functions to expose new `WaitForCompact` function and related options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11737 Test Plan: unit tests Reviewed By: jaykorean Differential Revision: D48568239 Pulled By: abulimov fbshipit-source-id: 1ff35972d7abacd7e1e17fe2ada1e20cdc88d8de
-
由 chuhao zeng 提交于
Summary: Fix https://github.com/facebook/rocksdb/issues/6470 Ensure TransactionLogIter being initialized correctly with SYNC_POINT API when calling `GetSortedWALFiles`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11725 Reviewed By: akankshamahajan15 Differential Revision: D48529411 Pulled By: ajkr fbshipit-source-id: 970ca1a6259ed996c6d87f7fcd40f95acf441517
-
由 Changyu Bi 提交于
Summary: Currently the stress test does not support restoring expected state (to a specific sequence number) when there is unsynced data loss during the reopen phase. This causes a few internal stress test failure with errors like inconsistent value. This PR disables dropping unsynced data during reopen to avoid failures due to this issue. We can re-enable later after we decide to support unsynced data loss during DB reopen in stress test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11731 Test Plan: * Running this test a few times can fail for inconsistent value before this change ``` ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_bits=20.57166126835524 --bottommost_compression_type=disable --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_style=1 --compaction_ttl=100 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --enable_thread_tracking=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=3 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=6 --index_type=3 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=1000000 --log2_keys_per_lock=10 --long_running_snapshots=1 --manual_wal_flush_one_in=1000000 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_max_range_deletions=100 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=5 --open_write_fault_one_in=0 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=10 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=524288 --readpercent=50 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=0 --secondary_cache_fault_one_in=32 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --subcompactions=3 --sync=0 --sync_fault_injection=1 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35``` Reviewed By: hx235 Differential Revision: D48537494 Pulled By: cbi42 fbshipit-source-id: ddae21b9bb6ee8d67229121f58513e95f7ef6d8d
-
- 22 8月, 2023 5 次提交
-
-
由 Yu Zhang 提交于
Summary: For some ldb commands that doesn't need to open the DB, it's still useful to use the DB's existing OPTIONS file if it's available. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11721 Reviewed By: pdillinger Differential Revision: D48485540 Pulled By: jowlyzhang fbshipit-source-id: 2d2db837523044066f1a2c4b59a5c03f6cd35e6b
-
由 anand76 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11728 Reviewed By: jaykorean, jowlyzhang Differential Revision: D48527100 Pulled By: anand1976 fbshipit-source-id: c48baa44e538fb6bfd3fe7f19046746d3540763f
-
由 Jay Huh 提交于
Summary: As the new API to wait for compaction is available (https://github.com/facebook/rocksdb/issues/11436), we can now replace the existing logic of waiting in db_bench_tool with the new API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11727 Test Plan: ``` ./db_bench --benchmarks="fillrandom,compactall,waitforcompaction,readrandom" ``` **Before change** ``` Set seed to 1692635571470041 because --seed was 0 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags Integrated BlobDB: blob cache disabled RocksDB: version 8.6.0 Date: Mon Aug 21 09:33:40 2023 CPU: 80 * Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz CPUCache: 28160 KB Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 WARNING: Optimization is disabled: benchmarks unnecessarily slow WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags Integrated BlobDB: blob cache disabled DB path: [/tmp/rocksdbtest-226125/dbbench] fillrandom : 51.826 micros/op 19295 ops/sec 51.826 seconds 1000000 operations; 2.1 MB/s waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished DB path: [/tmp/rocksdbtest-226125/dbbench] readrandom : 39.042 micros/op 25613 ops/sec 39.042 seconds 1000000 operations; 1.8 MB/s (632886 of 1000000 found) ``` **After change** ``` Set seed to 1692636574431745 because --seed was 0 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags Integrated BlobDB: blob cache disabled RocksDB: version 8.6.0 Date: Mon Aug 21 09:49:34 2023 CPU: 80 * Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz CPUCache: 28160 KB Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 WARNING: Optimization is disabled: benchmarks unnecessarily slow WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags Integrated BlobDB: blob cache disabled DB path: [/tmp/rocksdbtest-226125/dbbench] fillrandom : 51.271 micros/op 19504 ops/sec 51.271 seconds 1000000 operations; 2.2 MB/s waitforcompaction(/tmp/rocksdbtest-226125/dbbench): started waitforcompaction(/tmp/rocksdbtest-226125/dbbench): finished with status (OK) DB path: [/tmp/rocksdbtest-226125/dbbench] readrandom : 39.264 micros/op 25468 ops/sec 39.264 seconds 1000000 operations; 1.8 MB/s (632921 of 1000000 found) ``` Reviewed By: ajkr Differential Revision: D48524667 Pulled By: jaykorean fbshipit-source-id: 1052a15b2ed79a35165ec4d9998d0454b2552ef4
-
由 Yu Zhang 提交于
Summary: This piggy back the existing last level file temperature statistics test to test the default temperature becoming effective. While adding this unit test, I found that the approach to swap out and use default temperature in `VersionBuilder::LoadTableHandlers` will miss the L0 files created from flush, and only work for existing SST files, SST files created by compaction. So this PR moves that logic to `TableCache::GetTableReader`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11722 Test Plan: ``` ./db_test2 --gtest_filter="*LastLevelStatistics*" make all check ``` Reviewed By: pdillinger Differential Revision: D48489171 Pulled By: jowlyzhang fbshipit-source-id: ac29f7d484916f3218729594c5bb35c4f2979ac2
-
由 Levi Tamasi 提交于
Summary: [draft] this PR is created in order to test CI changes Closes: https://github.com/facebook/rocksdb/pull/11543 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11633 Reviewed By: akankshamahajan15 Differential Revision: D48525552 Pulled By: cbi42 fbshipit-source-id: 758d57f248304213228af459789459cc2f0bf419
-
- 19 8月, 2023 6 次提交
-
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** as title, should be harmless. And it's a guessed fix to https://github.com/facebook/rocksdb/issues/11717 while no repro has obtained on my end yet. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11720 Test Plan: existing tests Reviewed By: cbi42 Differential Revision: D48475661 Pulled By: hx235 fbshipit-source-id: 7c7390319f094c540e703fe2e78a8d601b7a894b
-
由 akankshamahajan 提交于
Summary: Implement trimming of readahead_size under a new option ReadOptions.auto_readahead_size. It'll trim the readahead_size during prefetching upto iterate_upper_bound offset only when ReadOptions.iterate_upper_bound is set, therefore reducing the prefetching of data beyond upper_bound. It's enabled for both implicit auto readahead size and when ReadOptions.readahead_size is specified and for sync and async_io. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11684 Test Plan: Added new unit test Reviewed By: anand1976 Differential Revision: D48479723 Pulled By: akankshamahajan15 fbshipit-source-id: 2b1703579caf779105e836b580866ffd7db076fc
-
由 Changyu Bi 提交于
Summary: Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h: * User can set CompressionOptions::checksum to true to enable this feature. * We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`. * Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag. * Error handling during decompression to propagate error message from ZSTD. * Updated microbench to test read performance impact. About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666 Test Plan: * Existing unit tests for zstd compression * Add unit test `DBTest2.ZSTDChecksum` to test the corruption case * Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API. * Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work. * Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2. * Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable. * Microbench: ``` TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100 Min out of 100 runs: Main: 10390 10436 10456 10484 10499 10535 10544 10545 10565 10568 After this PR, checksum=false 10285 10397 10503 10508 10515 10557 10562 10635 10640 10660 After this PR, checksum=true 10827 10876 10925 10949 10971 11052 11061 11063 11100 11109 ``` * db_bench: ``` Write perf TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=.. [FillSeq checksum=0] fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec [FillSeq checksum=1] fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec Read perf TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1 [Readrandom checksum=1] readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec [Readrandom checksum=0] readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec Compression TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1 checksum=1 compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec checksum=0 compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec Decompression: TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1 checksum = 0 uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec checksum = 1 uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec ``` Reviewed By: ajkr Differential Revision: D48019378 Pulled By: cbi42 fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
-
由 Jay Huh 提交于
Summary: While it's rare, we may run into a scenario where `WaitForCompact()` waits for background jobs indefinitely. For example, not enough space error will add the job back to the queue while WaitForCompact() waits for _all jobs_ including the jobs that are in the queue to be completed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11711 Test Plan: `DBCompactionWaitForCompactTest::WaitForCompactToTimeout` added `timeout` option added to the variables for all of the existing DBCompactionWaitForCompactTests Reviewed By: pdillinger, jowlyzhang Differential Revision: D48416390 Pulled By: jaykorean fbshipit-source-id: 7b6a12f705ab6c6dfaf8ad736a484ca654a86106
-
由 anand76 提交于
Summary: This PR implements a new admission policy for the compressed secondary cache, which includes the functionality of the existing policy, and also admits items evicted from the primary block cache with the hit bit set. Effectively, the new policy works as follows - 1. When an item is demoted from the primary cache without a hit, a placeholder is inserted in the compressed cache. A second demotion will insert the full entry. 2. When an item is promoted from the compressed cache to the primary cache for the first time, a placeholder is inserted in the primary. The second promotion inserts the full entry, while erasing it form the compressed cache. 3. If an item is demoted from the primary cache with the hit bit set, it is immediately inserted in the compressed secondary cache. The ```TieredVolatileCacheOptions``` has been updated with a new option, ```adm_policy```, which allows the policy to be selected. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11713 Reviewed By: pdillinger Differential Revision: D48444512 Pulled By: anand1976 fbshipit-source-id: b4cbf8c169a88097dff08e36e8bc4b3088de1492
-
由 Han Zhu 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11714 Fixes T161017540. The staging build starts failing with an undefined symbol error: ``` ld.lld: error: undefined symbol: std::enable_if<rocksdb::ParsedFullFilterBlock::kCacheEntryRole == (rocksdb::CacheEntryRole)13 || true, rocksdb::Status>::type rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, bool, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*, bool) const ``` This is the `MaybeReadBlockAndLoadToCache` function where `TBlocklike = ParsedFullFilterBlock`. The trigger was an FDO profile update D48261413. `MaybeReadBlockAndLoadToCache` is used in the same translation unit `block_based_table_reader.cc`, and also in another file `partitioned_filter_block.cc`. The later was the file that couldn't find the symbol. It seems after the FDO profile update, `MaybeReadBlockAndLoadToCache` may've got inlined into its caller in `block_based_table_reader.cc`. And with no knowledge of other usages, the symbol got stripped. Explicitly instantiate the template similar to how `RetrieveBlock` was handled. Reviewed By: pdillinger, akankshamahajan15 Differential Revision: D48400574 fbshipit-source-id: d4a80999bfb6ce4afa80678444139fcd8ae84aa4
-
- 18 8月, 2023 2 次提交
-
-
由 Yu Zhang 提交于
Summary: Add a column family option `default_temperature` that will be used for file reading accounting purpose, such as io statistics, for files that don't have an explicitly set temperature. This options is not a mutable one, changing its value would require a DB restart. This is to avoid the confusion that had the option being a mutable one, the users may expect it to take effect on all files immediately, while in reality, it would only become effective for SST files opened in the future. This `default_temperature` also just affect accounting during one DB session. It won't be recorded in manifest as the file's temperature and can be different across different DB sessions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11708 Test Plan: ``` make all check ``` Reviewed By: pdillinger Differential Revision: D48375763 Pulled By: jowlyzhang fbshipit-source-id: eb756696c14a694c6e2a93d2bb6f040563194981
-
由 Peter Dillinger 提交于
Summary: * JemallocNodumpAllocator was passing a size_t to FastRange32, which could cause compilation errors or warnings (seen with clang) * Fixed the order of arguments to match what would be used with modulo operator (%), for clarity. Fixes https://github.com/facebook/rocksdb/issues/11006 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11707 Test Plan: no functional change, existing tests Reviewed By: ajkr Differential Revision: D48435149 Pulled By: pdillinger fbshipit-source-id: e6e8b107ded4eceda37db20df59985c846a2546b
-
- 17 8月, 2023 2 次提交
-
-
由 Changyu Bi 提交于
Summary: For leveled compaction, RocksDB has a special kind of compaction with reason "kBottommmostFiles" that compacts bottommost level files to clear data held by snapshots (more detail in https://github.com/facebook/rocksdb/issues/3009). Such compactions can happen soon after a relevant snapshot is released. For some use cases, a bottommost file may contain only a small amount of keys that can be cleared, so compacting such a file has a high write amp. In addition, these bottommost files may be compacted in compactions with reason other than "kBottommmostFiles" if we wait for some time (so that enough data is ingested to trigger such a compaction). This PR introduces an option `bottommost_file_compaction_delay` to specify the delay of these bottommost level single file compactions. * The main change is in `VersionStorageInfo::ComputeBottommostFilesMarkedForCompaction()` where we only add a file to `bottommost_files_marked_for_compaction_` if it oldest_snapshot is larger than its non-zero largest_seqno **and** the file is old enough. Note that if a file is not old enough but its largest_seqno is less than oldest_snapshot, we exclude it from the calculation of `bottommost_files_mark_threshold_`. This makes the change simpler, but such a file's eligibility for compaction will only be checked the next time `ComputeBottommostFilesMarkedForCompaction()` is called. This happens when a new Version is created (compaction, flush, SetOptions()...), a new enough snapshot is released (`VersionStorageInfo::UpdateOldestSnapshot()`) or when a compaction is picked and compaction score has to be re-calculated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11701 Test Plan: * Add two unit tests to test when bottommost_file_compaction_delay > 0. * Ran crash test with the new option. Reviewed By: jaykorean, ajkr Differential Revision: D48331564 Pulled By: cbi42 fbshipit-source-id: c584f3dc5f6354fce3ed65f4c6366dc450b15ba8
-
由 Andrew Kryczka 提交于
Summary: See https://github.com/facebook/rocksdb/issues/11613 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11665 Reviewed By: hx235 Differential Revision: D48010507 Pulled By: ajkr fbshipit-source-id: 65c6d87d2c6ffc9d25f1d17106eae467ec528082
-