- 22 4月, 2023 1 次提交
-
-
由 Hui Xiao 提交于
Summary: **Context:** The existing stat rocksdb.sst.read.micros does not reflect each of compaction and flush cases but aggregate them, which is not so helpful for us to understand IO read behavior of each of them. **Summary** - Update `StopWatch` and `RandomAccessFileReader` to record `rocksdb.sst.read.micros` and `rocksdb.file.{flush/compaction}.read.micros` - Fixed the default histogram in `RandomAccessFileReader` - New field `ReadOptions/IOOptions::io_activity`; Pass `ReadOptions` through paths under db open, flush and compaction to where we can prepare `IOOptions` and pass it to `RandomAccessFileReader` - Use `thread_status_util` for assertion in `DbStressFSWrapper` for continuous testing on we are passing correct `io_activity` under db open, flush and compaction Pull Request resolved: https://github.com/facebook/rocksdb/pull/11288 Test Plan: - **Stress test** - **Db bench 1: rocksdb.sst.read.micros COUNT ≈ sum of rocksdb.file.read.flush.micros's and rocksdb.file.read.compaction.micros's.** (without blob) - May not be exactly the same due to `HistogramStat::Add` only guarantees atomic not accuracy across threads. ``` ./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 (-use_plain_table=1 -prefix_size=10) ``` ``` // BlockBasedTable rocksdb.sst.read.micros P50 : 2.009374 P95 : 4.968548 P99 : 8.110362 P100 : 43.000000 COUNT : 40456 SUM : 114805 rocksdb.file.read.flush.micros P50 : 1.871841 P95 : 3.872407 P99 : 5.540541 P100 : 43.000000 COUNT : 2250 SUM : 6116 rocksdb.file.read.compaction.micros P50 : 2.023109 P95 : 5.029149 P99 : 8.196910 P100 : 26.000000 COUNT : 38206 SUM : 108689 // PlainTable Does not apply ``` - **Db bench 2: performance** **Read** SETUP: db with 900 files ``` ./db_bench -db=/dev/shm/testdb/ -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -disable_auto_compactions=true -target_file_size_base=655 -compression_type=none ```run till convergence ``` ./db_bench -seed=1678564177044286 -use_existing_db=true -db=/dev/shm/testdb -benchmarks=readrandom[-X60] -statistics=true -num=1000000 -disable_auto_compactions=true -compression_type=none -bloom_bits=3 ``` Pre-change `readrandom [AVG 60 runs] : 21568 (± 248) ops/sec` Post-change (no regression, -0.3%) `readrandom [AVG 60 runs] : 21486 (± 236) ops/sec` **Compaction/Flush**run till convergence ``` ./db_bench -db=/dev/shm/testdb2/ -seed=1678564177044286 -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -disable_auto_compactions=false -target_file_size_base=655 -compression_type=none rocksdb.sst.read.micros COUNT : 33820 rocksdb.sst.read.flush.micros COUNT : 1800 rocksdb.sst.read.compaction.micros COUNT : 32020 ``` Pre-change `fillseq [AVG 46 runs] : 1391 (± 214) ops/sec; 0.7 (± 0.1) MB/sec` Post-change (no regression, ~-0.4%) `fillseq [AVG 46 runs] : 1385 (± 216) ops/sec; 0.7 (± 0.1) MB/sec` Reviewed By: ajkr Differential Revision: D44007011 Pulled By: hx235 fbshipit-source-id: a54c89e4846dfc9a135389edf3f3eedfea257132
-
- 21 4月, 2023 3 次提交
-
-
由 Andrew Kryczka 提交于
Summary: As titled Pull Request resolved: https://github.com/facebook/rocksdb/pull/11390 Reviewed By: cbi42 Differential Revision: D45148830 Pulled By: ajkr fbshipit-source-id: 9a8dfd040514bae3d8ed9e97a79cae7683f2749a
-
由 Andrew Kryczka 提交于
Summary: The old cleanup code had a race condition: 1. Test thread: DestroyDB() marked a file as trash 2. DeleteScheduler thread: Got the file's size and decided to delete it in chunks 3. Test thread: DestroyDir() deleted that trash file 4. DeleteScheduler thread: Began deleting in chunks starting by calling ReopenWritableFile(). Unfortunately this recreates the deleted trash file 5. Test thread: DestroyDir() fails to remove the parent directory because it contains the file created in 4. 6. Test thread: Checkpoint::Create() fails due to the directory already existing It could be repro'd with the following patch/command. Patch: ``` diff --git a/file/delete_scheduler.cc b/file/delete_scheduler.cc index 8a2d1615d..337d24a60 100644 --- a/file/delete_scheduler.cc +++ b/file/delete_scheduler.cc @@ -317,6 +317,12 @@ Status DeleteScheduler::DeleteTrashFile(const std::string& path_in_trash, &num_hard_links, nullptr); if (my_status.ok()) { if (num_hard_links == 1) { + // Give some time for DestroyDir() to delete file entries. Then, the + // below `ReopenWritableFile()` will recreate files, preventing the + // parent directory from being deleted. + if (rand() % 2 == 0) { + usleep(1000); + } std::unique_ptr<FSWritableFile> wf; my_status = fs_->ReopenWritableFile(path_in_trash, FileOptions(), &wf, nullptr); diff --git a/file/file_util.cc b/file/file_util.cc index 43608fcdc..2cee1ad8e 100644 --- a/file/file_util.cc +++ b/file/file_util.cc @@ -263,6 +263,13 @@ Status DestroyDir(Env* env, const std::string& dir) { } } + // Give some time for the DeleteScheduler thread's ReopenWritableFile() to + // recreate deleted files + if (dir.find("checkpoint") != std::string::npos) { + fprintf(stderr, "waiting to destroy %s\n", dir.c_str()); + usleep(10000); + } + if (s.ok()) { s = env->DeleteDir(dir); // DeleteDir might or might not report NotFound ``` Command: ``` TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=131072 --target_file_size_base=131072 --max_bytes_for_level_base=524288 --checkpoint_one_in=100 --clear_column_family_one_in=0 --max_key=1000 --value_size_mult=33 --sst_file_manager_bytes_per_truncate=4096 --sst_file_manager_bytes_per_sec=1048576 --interval=3 --compression_type=none --sync_fault_injection=1 ``` Obviously we don't want to use scheduled deletion here as we need the checkpoint directory deleted immediately. I suspect the DestroyDir() was an attempt to fixup incomplete DestroyDB()s. Now that we expect DestroyDB() to be complete I removed that code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11389 Reviewed By: hx235 Differential Revision: D45137142 Pulled By: ajkr fbshipit-source-id: 2af743d342c77cc414fd25fc4c9d7c9c6079ad24
-
由 Changyu Bi 提交于
Summary: during manual compaction (CompactRange()), L0->L1 trivial move is disabled when only L0 overlaps with compacting key range (introduced in https://github.com/facebook/rocksdb/issues/7368 to enforce kForce* contract). This can cause large memory usage due to compaction readahead when number of L0 files is large. This PR allows L0->L1 trivial move in this case, and will do a L1 -> L1 intra-level compaction when needed (`bottommost_level_compaction` is kForce*). In brief, consider a DB with only L0 file, and user calls CompactRange(kForce, nullptr, nullptr), - before this PR, RocksDB does a L0 -> L1 compaction (disallow trivial move), - after this PR, RocksDB does a L0 -> L1 compaction (allow trivial move), and a L1 -> L1 compaction. Users can use kForceOptimized to avoid this extra L1->L1 compaction overhead when L0s are overlapping and cannot be trivial moved. This PR also fixed a bug (see previous discussion in https://github.com/facebook/rocksdb/issues/11041) where `final_output_level` of a manual compaction can be miscalculated when `level_compaction_dynamic_level_bytes=true`. This bug could cause incorrect level being moved when CompactRangeOptions::change_level is specified. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11375 Test Plan: - Added new unit tests to test that L0 -> L1 compaction allows trivial move and L1 -> L1 compaction is done when needed. Reviewed By: ajkr Differential Revision: D44943518 Pulled By: cbi42 fbshipit-source-id: e9fb770d17b163c18a623e1d1bd6b81159192708
-
- 19 4月, 2023 1 次提交
-
-
由 Andrew Kryczka 提交于
Summary: build-windows-vs2022 jobs (e.g., https://app.circleci.com/pipelines/github/facebook/rocksdb/26641/workflows/7d1c58b8-7dd6-4dd6-a222-ecdfb0892c3b/jobs/593583) began failing with: ``` (CustomBuild target) -> CUSTOMBUILD : error : Source option 7 is no longer supported. Use 8 or later. [C:\Users\circleci.PACKER-64370BA5\project\build\java\rocksdbjni_classes.vcxproj] ``` So, this PR tries setting `-source 8` instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11385 Reviewed By: ltamasi Differential Revision: D45058172 Pulled By: ajkr fbshipit-source-id: b0daa1ea6f576c8417add40bd6c92710d329c44d
-
- 18 4月, 2023 3 次提交
-
-
由 Peter Dillinger 提交于
Summary: Because of this failure with snappy 1.1.8, ROCKSDB_NO_FBCODE=1 ``` Value 3531 is not in range [2000, 3525] table/table_test.cc:4231: Failure ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11384 Test Plan: run updated test in failing configuration Reviewed By: ajkr Differential Revision: D45057161 Pulled By: pdillinger fbshipit-source-id: 397054f08033315e2e2bd9410f1fa32ddbf3b9c8
-
由 Andrew Kryczka 提交于
Summary: This test exhibited the following flaky failure: ``` db/db_write_test.cc:653: Failure db_->Resume() Corruption: Not active ``` I was able to repro it by applying the following patch to coerce a specific race condition: ``` diff --git a/db/db_write_test.cc b/db/db_write_test.cc index d82c57376..775ba3cde 100644 --- a/db/db_write_test.cc +++ b/db/db_write_test.cc @@ -636,6 +636,10 @@ TEST_P(DBWriteTest, LockWALInEffect) { ASSERT_TRUE(dbfull()->WALBufferIsEmpty()); ASSERT_OK(db_->UnlockWAL()); + // Test thread: sleep interval: [0, 3) + // In this interval, the file system is active + sleep(3); + // Fail the WAL flush if applicable fault_fs->SetFilesystemActive(false); Status s = Put("key2", "value"); @@ -649,6 +653,11 @@ TEST_P(DBWriteTest, LockWALInEffect) { ASSERT_OK(db_->LockWAL()); ASSERT_OK(db_->UnlockWAL()); } + + // Test thread: sleep interval: [3, 6) + // In this interval, the file system is inactive + sleep(3); + fault_fs->SetFilesystemActive(true); ASSERT_OK(db_->Resume()); // Writes should work again diff --git a/db/flush_job.cc b/db/flush_job.cc index 8193f594f..602ee2c9f 100644 --- a/db/flush_job.cc +++ b/db/flush_job.cc @@ -979,6 +979,10 @@ Status FlushJob::WriteLevel0Table() { DirFsyncOptions(DirFsyncOptions::FsyncReason::kNewFileSynced)); } TEST_SYNC_POINT_CALLBACK("FlushJob::WriteLevel0Table", &mems_); + // Flush thread: sleep interval: [0, 4) + // Upon awakening, the file system will be inactive. Then the MANIFEST + // update will fail. + sleep(4); db_mutex_->Lock(); } base_->Unref(); ``` The fix for this scenario is explained in the code change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11382 Reviewed By: cbi42 Differential Revision: D45027632 Pulled By: ajkr fbshipit-source-id: 6bfa35a5781c0c080fb74e13f2b2c9f871f7effb
-
由 Andrew Kryczka 提交于
Summary: In CircleCI build-linux-arm-test-full job (https://app.circleci.com/pipelines/github/facebook/rocksdb/26462/workflows/a9d39d2c-c970-4b0f-9c10-7743beb9771b/jobs/591722), this test exhibited the following flaky failure: ``` db/db_bloom_filter_test.cc:2506: Failure Expected: (TestGetTickerCount(options, BLOOM_FILTER_USEFUL)) > (65000 * 2), actual: 120558 vs 130000 ``` I ssh'd to an instance and observed it cuts memtables at slightly different points across runs. Logging in `ConcurrentArena` pointed to `try_lock()` returning false at different points across runs. This PR changes the approach to allow a fixed number of keys per memtable flush. I verified the bloom filter useful count is deterministic now even on the CircleCI ARM instance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11383 Reviewed By: cbi42 Differential Revision: D45036829 Pulled By: ajkr fbshipit-source-id: b602dacb63955f1af09bf0ed409cde0552805a08
-
- 16 4月, 2023 2 次提交
-
-
由 Murali Vilayannur 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11342 Reviewed By: ajkr Differential Revision: D45026838 Pulled By: mnv104 fbshipit-source-id: 099ed9579922b8fa6e7d3332bbb829d50ec47d91
-
由 mayue.fight 提交于
Summary: When calculating the largest_key in ImportColumnFamilyJob::GetIngestedFileInfo, only the first element of range_del_iter is calculated. If range_del_iter has multiple elements, the largest_key will be wrong Pull Request resolved: https://github.com/facebook/rocksdb/pull/11381 Reviewed By: cbi42 Differential Revision: D44981450 Pulled By: ajkr fbshipit-source-id: 584bc7da86295568a96984d2951644f289e578c7
-
- 15 4月, 2023 2 次提交
-
-
由 Changyu Bi 提交于
Summary: Before this PR, in `LevelCompactionBuilder::TryExtendNonL0TrivialMove(index)`, we start from a file at index and expand the compaction input towards right to find files to trivial move. This PR adds the logic to also expand towards left. Another major change made in this PR is to not expand L0 files through `TryExtendNonL0TrivialMove()`. This happens currently when compacting L0 files to an empty output level. The condition for expanding files in `TryExtendNonL0TrivialMove()` is to check atomic boundary, which does not take into account that L0 files can overlap in key range and are not sorted in key order. So it may include more L0 files than needed and disallow a trivial move. This change is included in this PR so that we don't make it worse by always expanding L0 in both direction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11347 Test Plan: * new unit test * Benchmark does not show obvious improvement or regression: ``` Write sequentially ./db_bench --benchmarks=fillseq --compression_type=lz4 --write_buffer_size=1000000 --num=100000000 --value_size=100 -level_compaction_dynamic_level_bytes --target_file_size_base=7340032 --max_bytes_for_level_base=16777216 Main: fillseq : 4.726 micros/op 211592 ops/sec 472.607 seconds 100000000 operations; 23.4 MB/s This PR: fillseq : 4.755 micros/op 210289 ops/sec 475.534 seconds 100000000 operations; 23.3 MB/s Write randomly ./db_bench --benchmarks=fillrandom --compression_type=lz4 --write_buffer_size=1000000 --num=100000000 --value_size=100 -level_compaction_dynamic_level_bytes --target_file_size_base=7340032 --max_bytes_for_level_base=16777216 Main: fillrandom : 16.351 micros/op 61159 ops/sec 1635.066 seconds 100000000 operations; 6.8 MB/s This PR: fillrandom : 15.798 micros/op 63298 ops/sec 1579.817 seconds 100000000 operations; 7.0 MB/s ``` Reviewed By: ajkr Differential Revision: D44645650 Pulled By: cbi42 fbshipit-source-id: 8631f3a6b3f01decbbf18c34f2b62833cb4f9733
-
由 mayue.fight 提交于
Summary: **Context/Summary:** ASSERT_EQ will only verify the code of Status, but will not check the state message of Status. - Assert by checking Status state in `ImportColumnFamilyTest` - Forgot to set db_comparator_name when creating ExportImportFilesMetaData in `ImportColumnFamilyNegativeTest` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11372 Reviewed By: ajkr Differential Revision: D45004343 Pulled By: cbi42 fbshipit-source-id: a13d45521df17ead3d6d4c1c1fe1e4c95397ce8b
-
- 13 4月, 2023 1 次提交
-
-
由 Jeff Palm 提交于
Summary: util/ribbon_test.cc: avoid ambiguous reversed operator error in c++20 (and enable checking for the error) Code would produce errors like this, when compiled with -Wambiguous-reversed-operator under c++20. ``` util/ribbon_test.cc:695:20: error: ISO C++20 considers use of overloaded operator '!=' (with operand types 'KeyGen' (aka '(anonymous namespace)::StandardKeyGen') and 'KeyGen') to be ambiguou s despite there being a unique best viable function with non-reversed arguments [-Werror,-Wambiguous-reversed-operator] while (cur != batch_end) { ~~~ ^ ~~~~~~~~~ util/ribbon_test.cc:111:8: note: candidate function with non-reversed arguments bool operator!=(const StandardKeyGen& other) { ^ util/ribbon_test.cc:107:8: note: ambiguous candidate function with reversed arguments bool operator==(const StandardKeyGen& other) { ^ ``` This will become a hard error in future standards. Confirmed that no errors were generated when building using clang and c++20: ``` USE_CLANG=1 USE_COROUTINES=1 make ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11371 Reviewed By: meyering Differential Revision: D44921027 Pulled By: cbi42 fbshipit-source-id: ef25b78260920a4d75a718310688d3a2487ffa87
-
- 12 4月, 2023 1 次提交
-
-
由 Yu Zhang 提交于
Summary: This option is immutable through the life time of the DB open. For now, updating its value between different DB open sessions is also a non compatible change. When I work on support for updating comparator, the type of updates accepted for this option will be supported then. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11362 Test Plan: `make check` Reviewed By: ltamasi Differential Revision: D44873870 Pulled By: jowlyzhang fbshipit-source-id: aa02094754b58d99abf9af4c9a8108c1350254cb
-
- 11 4月, 2023 2 次提交
-
-
由 Andrew Kryczka 提交于
Summary: Fixed the following failure: ``` third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc: In function ‘bool testing::internal::StackGrowsDown()’: third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:8681:24: error: ‘dummy’ may be used uninitialized [-Werror=maybe-uninitialized] 8681 | StackLowerThanAddress(&dummy, &result); | ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~ third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:8671:13: note: by argument 1 of type ‘const void*’ to ‘void testing::internal::StackLowerThanAddress(const void*, bool*)’ declared here 8671 | static void StackLowerThanAddress(const void* ptr, bool* result) { | ^~~~~~~~~~~~~~~~~~~~~ third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:8679:7: note: ‘dummy’ declared here 8679 | int dummy; | ^~~~~ ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11361 Reviewed By: cbi42 Differential Revision: D44838033 Pulled By: ajkr fbshipit-source-id: 27d68b5a24a15723bbaaa7de45ccd70a60fe259e
-
由 Niklas Fiekas 提交于
Summary: Makes it easier to use generated Rust bindings. Constness of these is already part of the C++ API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11243 Reviewed By: hx235 Differential Revision: D44840394 Pulled By: ajkr fbshipit-source-id: bcd1aeb8c959c304148d25b00043bb8c4cd3e0a4
-
- 08 4月, 2023 7 次提交
-
-
由 Zdenek Korcak 提交于
Summary: copy paste typo Pull Request resolved: https://github.com/facebook/rocksdb/pull/11325 Reviewed By: hx235 Differential Revision: D44378512 Pulled By: ajkr fbshipit-source-id: 509ed2697c06eed975914359ece0459a0ea40312
-
由 nccx 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11357 Reviewed By: hx235 Differential Revision: D44774454 Pulled By: ajkr fbshipit-source-id: f3912316b6cd4e0b41310590c93f914f1d943044
-
由 leipeng 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11348 Reviewed By: hx235 Differential Revision: D44774863 Pulled By: ajkr fbshipit-source-id: ba4bd959650228a71fca6bf62840ae9d7373d6f0
-
由 nccx 提交于
Summary: The CI systems other than CircleCI are almost always in a failing state. Since CircleCI covers linux, macos, and windows, we can remove the others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11354 Reviewed By: hx235 Differential Revision: D44774627 Pulled By: ajkr fbshipit-source-id: c83b298ec5afe4ea410744eda6cc98fc6a3365f1
-
由 Changyu Bi 提交于
Summary: valgrind complains "Conditional jump or move depends on uninitialised value(s)". A sample error message: ``` [ RUN ] DBCompactionTest.DrainUnnecessaryLevelsAfterDBBecomesSmall ==3353864== Conditional jump or move depends on uninitialised value(s) ==3353864== at 0x8647B4: rocksdb::VersionStorageInfo::ComputeCompactionScore(rocksdb::ImmutableOptions const&, rocksdb::MutableCFOptions const&) (version_set.cc:3414) ==3353864== by 0x86B340: rocksdb::VersionSet::AppendVersion(rocksdb::ColumnFamilyData*, rocksdb::Version*) (version_set.cc:4946) ==3353864== by 0x876B88: rocksdb::VersionSet::CreateColumnFamily(rocksdb::ColumnFamilyOptions const&, rocksdb::VersionEdit const*) (version_set.cc:6876) ==3353864== by 0xBA66FE: rocksdb::VersionEditHandler::CreateCfAndInit(rocksdb::ColumnFamilyOptions const&, rocksdb::VersionEdit const&) (version_edit_handler.cc:483) ==3353864== by 0xBA4A81: rocksdb::VersionEditHandler::Initialize() (version_edit_handler.cc:187) ==3353864== by 0xBA3927: rocksdb::VersionEditHandlerBase::Iterate(rocksdb::log::Reader&, rocksdb::Status*) (version_edit_handler.cc:31) ==3353864== by 0x870173: rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool) (version_set.cc:5729) ==3353864== by 0x7538FA: rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*, rocksdb::DBImpl::RecoveryContext*) (db_impl_open.cc:522) ==3353864== by 0x75BA0F: rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool) (db_impl_open.cc:1928) ==3353864== by 0x75A735: rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**) (db_impl_open.cc:1743) ==3353864== by 0x75A510: rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**) (db_impl_open.cc:1720) ==3353864== by 0x5925FD: rocksdb::DBTestBase::TryReopen(rocksdb::Options const&) (db_test_util.cc:710) ==3353864== Uninitialised value was created by a heap allocation ==3353864== at 0x4842F0F: operator new(unsigned long) (vg_replace_malloc.c:422) ==3353864== by 0x876AF4: rocksdb::VersionSet::CreateColumnFamily(rocksdb::ColumnFamilyOptions const&, rocksdb::VersionEdit const*) (version_set.cc:6870) ==3353864== by 0xBA66FE: rocksdb::VersionEditHandler::CreateCfAndInit(rocksdb::ColumnFamilyOptions const&, rocksdb::VersionEdit const&) (version_edit_handler.cc:483) ==3353864== by 0xBA4A81: rocksdb::VersionEditHandler::Initialize() (version_edit_handler.cc:187) ==3353864== by 0xBA3927: rocksdb::VersionEditHandlerBase::Iterate(rocksdb::log::Reader&, rocksdb::Status*) (version_edit_handler.cc:31) ==3353864== by 0x870173: rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool) (version_set.cc:5729) ==3353864== by 0x7538FA: rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*, rocksdb::DBImpl::RecoveryContext*) (db_impl_open.cc:522) ==3353864== by 0x75BA0F: rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool) (db_impl_open.cc:1928) ==3353864== by 0x75A735: rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**) (db_impl_open.cc:1743) ==3353864== by 0x75A510: rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**) (db_impl_open.cc:1720) ==3353864== by 0x5925FD: rocksdb::DBTestBase::TryReopen(rocksdb::Options const&) (db_test_util.cc:710) ==3353864== by 0x591F73: rocksdb::DBTestBase::Reopen(rocksdb::Options const&) (db_test_util.cc:662) ``` This is likely about `lowest_unnecessary_level_` even though it would be initialized in `CalculateBaseBytes()` before being used in `ComputeCompactionScore()`. Initialize it also in VersionStorageInfo constructor to prevent valgrind from complaining. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11359 Test Plan: - ran a test with valgrind which gave the error message above before this PR: `valgrind --track-origins=yes ./db_compaction_test --gtest_filter="*DrainUnnecessaryLevelsAfterDBBecomesSmall*"` Reviewed By: hx235 Differential Revision: D44799112 Pulled By: cbi42 fbshipit-source-id: 557208a66f04a2163b418b2a651bdb7e777c4511
-
由 Peter Dillinger 提交于
Summary: After https://github.com/facebook/rocksdb/issues/11301, I wasn't sure whether I had regressed block cache tracing with MultiGet. Demo PR https://github.com/facebook/rocksdb/issues/11330 shows the flawed state of tracing MultiGet before my change, and based on the unit test, there was essentially no change in tracing behavior with https://github.com/facebook/rocksdb/issues/11301. This change is to leave that code and behavior better than I found it. This change is not intended to change any production behaviors except when block cache tracing is active, though might improve general read path efficiency by disabling some related tracking when such tracing is disabled. More detail on production code: * Refactoring to consolidate the construction of BlockCacheTraceRecord, and other related functionality, in block-based table reader, though it's somewhat awkward to preserve an optimization to avoid copying Slices into temporary strings in BlockCacheLookupContext. * Accurately track cache hits and misses (etc.) for each data block accessed by a MultiGet(). (Previously reported hits as misses.) * Reduced repeated checking of `block_cache_tracer_` state (by creating lookup_context only when active) for efficiency and to reduce the risk of corner case bugs where tracing is enabled or disabled for different parts of a read op. (See a TODO below) * Improved estimate calculation for num_keys_in_block (see code comment) Possible follow-up: * `XXX:` use_cache=true means double cache query? (possible double-query of block cache when allow_mmap_reads=true) * `TODO:` need more than one lookup_context here to track individual filter and index partition hits and misses * `TODO:` optimize more state checks of `block_cache_tracer_` down to `lookup_context != nullptr` * Pre-existing `XXX:` There appear to be 'break' statements above that bypass this writing of the block cache trace record * Expand test coverage (see below) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11339 Test Plan: * Added a basic unit test for block cache tracing MultiGet, for now just covering one data block with two keys. * Added HitMissCountingCache to independently verify that the actual block cache trace and expected block cache trace also agree with the actual number of cache hits / misses (nothing missing or mislabeled). For now only used with MultiGet test. * Better testing of num_keys_in_block, for now just with MultiGet * Misc improvements to table_test to improve clarity, such as making it clear that certain keys are auto-inserted at the start of every test. Performance test: Testing multireadrandom as in https://github.com/facebook/rocksdb/issues/11301, except averaging over distinct runs rather than [-X30] which doesn't seem to sufficiently reset after each run to work as an independent test run. Base with revert of 11301: 3148926 ops/sec Base: 3019146 ops/sec New: 2999529 ops/sec Possibly a tiny MultiGet CPU regression with this change. We are now always allocating an additional vector for the LookupContexts. I'm still contemplating options to try to correct the regression in https://github.com/facebook/rocksdb/issues/11301. Testing readrandom: Base with revert of 11301: 2311988 Base: 2281726 New: 2299722 Possibly a tiny Get CPU improvement with this change. We are now avoiding some unnecessary LookupContext population. Reviewed By: akankshamahajan15 Differential Revision: D44557845 Pulled By: pdillinger fbshipit-source-id: b841691799d2a48fb59cc8880dc7cbb1e107ae3d
-
由 Changyu Bi 提交于
Summary: when data block hash index finds a key of op_type `kTypeMerge`, do not redo data block seek. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11356 Test Plan: - added new unit test - crash test: `python3 tools/db_crashtest.py whitebox --simple --use_merge=1 --data_block_index_type=1` - benchmark see slight improvement in read throughput: ``` TEST_TMPDIR=/dev/shm/hashindex ./db_bench -benchmarks=mergerandom -use_existing_db=false -num=10000000 -compression_type=none -level_compaction_dynamic_level_bytes=1 -merge_operator=PutOperator -write_buffer_size=1000000 --use_data_block_hash_index=1 TEST_TMPDIR=/dev/shm/hashindex ./db_bench -benchmarks=readrandom[-X10] -use_existing_db=true -num=10000000 -merge_operator=PutOperator -readonly=1 -disable_auto_compactions=1 -reads=100000 Main: readrandom [AVG 10 runs] : 29526 (± 1118) ops/sec; 2.1 (± 0.1) MB/sec Post-PR: readrandom [AVG 10 runs] : 31095 (± 662) ops/sec; 2.2 (± 0.0) MB/sec ``` Reviewed By: pdillinger Differential Revision: D44759895 Pulled By: cbi42 fbshipit-source-id: 387f0c35938c7e0e96b810ca3babf1967fc68191
-
- 07 4月, 2023 2 次提交
-
-
由 Wentian Guo 提交于
Summary: If RocksDB enables user-defined timestamp, then RocksDB read path can filter table files by the min/max timestamps of each file. If application wants to lookup a key that is the most recent and visible to a certain timestamp ts, then we can compare ts with the min_ts of each file. If ts < min_ts, then we know all keys in the file is not visible at time ts, then we do not have to open the file. This can also save an IO. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11332 Reviewed By: pdillinger Differential Revision: D44763497 Pulled By: guowentian fbshipit-source-id: abde346b9f18480fe03c04e4006e7d62aa9c22a8
-
由 Changyu Bi 提交于
Summary: When a user migrates to level compaction + `level_compaction_dynamic_level_bytes=true`, or when a DB shrinks, there can be unnecessary levels in the DB. Before this PR, this is no way to remove these levels except a manual compaction. These extra unnecessary levels make it harder to guarantee max_bytes_for_level_multiplier and can cause extra space amp. This PR boosts compaction score for these levels to allow RocksDB to automatically drain these levels. Together with https://github.com/facebook/rocksdb/issues/11321, this makes migration to `level_compaction_dynamic_level_bytes=true` automatic without needing user to do a one time full manual compaction. Credit: this PR is modified from https://github.com/facebook/rocksdb/issues/3921. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11340 Test Plan: - New unit tests - `python3 tools/db_crashtest.py whitebox --simple` which randomly sets level_compaction_dynamic_level_bytes in each run. Reviewed By: ajkr Differential Revision: D44563884 Pulled By: cbi42 fbshipit-source-id: e20d3620bd73dff22be18c5a91a07f340740bcc8
-
- 06 4月, 2023 2 次提交
-
-
由 anand76 提交于
Summary: VerifyFileChecksums currently interprets the readahead_size as a payload of readahead_size for calculating the checksum, plus a prefetch of an additional readahead_size. Hence each read is readahead_size * 2. This change treats it as chunks of readahead_size for checksum calculation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11328 Test Plan: Add a unit test Reviewed By: pdillinger Differential Revision: D44718781 Pulled By: anand1976 fbshipit-source-id: 79bae1ebaa27de2a13bc86f5910bf09356936e63
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11355 Test Plan: - Ran previously failed tests and they succeed - Perf `./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3` Reviewed By: ajkr Differential Revision: D44719333 Pulled By: hx235 fbshipit-source-id: 23d22f314144071d97f7106ff1241c31c0bdf08b
-
- 05 4月, 2023 3 次提交
-
-
由 Andrew Kryczka 提交于
Summary: This is mostly taken from https://github.com/facebook/rocksdb/issues/10427 with my own comments addressed. This PR plumbs the user’s `ReadOptions` down to `GetOrReadIndexBlock()`, `GetOrReadFilterBlock()`, and `GetFilterPartitionBlock()`. Now those functions no longer have to make up a `ReadOptions` with incomplete information. I also let `PartitionIndexReader::NewIterator()` pass through its caller's `ReadOptions::verify_checksums`, which was inexplicably dropped previously. Fixes https://github.com/facebook/rocksdb/issues/10463 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11208 Test Plan: Functional: - Measured `-verify_checksum=false` applies to metadata blocks read outside of table open - setup command: `TEST_TMPDIR=/tmp/100M-DB/ ./db_bench -benchmarks=filluniquerandom,waitforcompaction -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -compression_type=none -num=1638400 -key_size=8 -value_size=56` - run command: `TEST_TMPDIR=/tmp/100M-DB/ ./db_bench -benchmarks=readrandom -use_existing_db=true -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -compression_type=none -num=1638400 -key_size=8 -value_size=56 -duration=10 -threads=32 -cache_size=131072 -statistics=true -verify_checksum=false -open_files=20 -cache_index_and_filter_blocks=true` - before: `rocksdb.block.checksum.compute.count COUNT : 384353` - after: `rocksdb.block.checksum.compute.count COUNT : 22` Performance: - Setup command (tmpfs, 128MB logical data size, cache indexes/filters without pinning so index/filter lookups go through table reader): `TEST_TMPDIR=/dev/shm/128M-DB/ ./db_bench -benchmarks=filluniquerandom,waitforcompaction -write_buffer_size=131072 -target_file_size_base=131072 -max_bytes_for_level_base=524288 -compression_type=none -num=4194304 -key_size=8 -value_size=24 -bloom_bits=8 -whole_key_filtering=1` - Measured point lookup performance. Database is fully cached to emphasize any new callstack overheads - Command: `TEST_TMPDIR=/dev/shm/128M-DB/ ./db_bench -benchmarks=readrandom[-W1][-X20] -use_existing_db=true -cache_index_and_filter_blocks=true -disable_auto_compactions=true -num=4194304 -key_size=8 -value_size=24 -bloom_bits=8 -whole_key_filtering=1 -duration=10 -cache_size=1048576000` - Before: `readrandom [AVG 20 runs] : 274848 (± 3717) ops/sec; 8.4 (± 0.1) MB/sec` - After: `readrandom [AVG 20 runs] : 277904 (± 4474) ops/sec; 8.5 (± 0.1) MB/sec` Reviewed By: hx235 Differential Revision: D43145366 Pulled By: ajkr fbshipit-source-id: 75ec062ece86a82cd788783de9de2c72df57f994
-
由 Peter Dillinger 提交于
Summary: I previously misread or misinterpreted API contracts for SecondaryCache and this should correct the record. (Follow-up item from https://github.com/facebook/rocksdb/issues/11301) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11316 Test Plan: comments only Reviewed By: anand1976 Differential Revision: D44245107 Pulled By: pdillinger fbshipit-source-id: 3f8ddec150674b75728f1730f99b963bbf7b76e7
-
由 Peter Dillinger 提交于
Summary: ... which increases default number of shards from 16 to 64. Although the default block cache size is only recommended for applications where RocksDB is not performance-critical, under stress conditions, block cache mutex contention could become a performance bottleneck. This change of default should alleviate that. Note that reducing the size of cache shards (recommended minimum 512MB) could cause thrashing, e.g. on filter blocks, so capacity needs to increase to safely increase number of shards. The 8MB default dates back to 2011 or earlier (f779e7a5), when the most simultaneous threads you could get from a single CPU socket was 20 (e.g. Intel Xeon E7-8870). Now more than 100 is available. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11350 Test Plan: unit tests updated Reviewed By: cbi42 Differential Revision: D44674873 Pulled By: pdillinger fbshipit-source-id: 91ed3070789b42679283c7e6dc97c41a6a97bdf4
-
- 04 4月, 2023 2 次提交
-
-
由 Niklas Fiekas 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11327 Reviewed By: cbi42 Differential Revision: D44422225 Pulled By: ajkr fbshipit-source-id: 3bfcf47290b3133c151bdfdd181896ba2e6be520
-
由 Peter Dillinger 提交于
Summary: Was getting compilation failure with old verison of gflags, examples in https://github.com/facebook/rocksdb/issues/11344. Perhaps this is new since enabling C++17. Getting rid of std::reference_wrapper from https://github.com/facebook/rocksdb/issues/10729 seems to fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11346 Test Plan: manual, CI Reviewed By: guowentian Differential Revision: D44632776 Pulled By: pdillinger fbshipit-source-id: 5c1f3f79a055698574538b6342c912a627b6d061
-
- 31 3月, 2023 2 次提交
-
-
由 anand76 提交于
Summary: Platform009 is no longer supported in fbcode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11333 Reviewed By: pdillinger, ltamasi Differential Revision: D44486431 Pulled By: anand1976 fbshipit-source-id: 99e19a70ebbb04ae750d39c33a110518bb25487e
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** - Allow runtime changes to whether `WriteBufferManager` allows stall or not by calling `SetAllowStall()` - Misc: some clean up - see PR conversation Pull Request resolved: https://github.com/facebook/rocksdb/pull/11335 Test Plan: - New UT Reviewed By: akankshamahajan15 Differential Revision: D44502555 Pulled By: hx235 fbshipit-source-id: 24b5cc57df7734b11d42e4870c06c87b95312b5e
-
- 30 3月, 2023 1 次提交
-
-
由 Levi Tamasi 提交于
Summary: Similarly to `GetEntity` prior to https://github.com/facebook/rocksdb/issues/11303, the `MultiGetEntity` API is currently only used in the DB verification logic of the stress tests. The patch introduces a new mode where all point lookups are performed using `MultiGetEntity`, and implements the corresponding logic in the non-batched, batched, and CF consistency tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11336 Test Plan: Ran simple blackbox tests for the various stress test flavors. Reviewed By: akankshamahajan15 Differential Revision: D44513285 Pulled By: ltamasi fbshipit-source-id: c3db098501bf875b6a356b09fc676a0268d92c35
-
- 29 3月, 2023 1 次提交
-
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** Motived by user need of investigating db iterator behavior during an interval of any time length of a certain thread, we decide to collect and expose related counters in `PerfContext` as an experimental feature, in addition to the existing db-scope ones (i.e, tickers) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11320 Test Plan: - new UT - db bench Setup ``` ./db_bench -db=/dev/shm/testdb/ -benchmarks="fillseq" -key_size=32 -value_size=512 -num=1000000 -compression_type=none -bloom_bits=3 ``` Test till converges ``` ./db_bench -seed=1679526311157283 -use_existing_db=1 -perf_level=2 -db=/dev/shm/testdb/ -benchmarks="seekrandom[-X60]" ``` pre-change `seekrandom [AVG 33 runs] : 7545 (± 100) ops/sec` post-change (no regression) `seekrandom [AVG 33 runs] : 7688 (± 67) ops/sec` Reviewed By: cbi42 Differential Revision: D44321931 Pulled By: hx235 fbshipit-source-id: f98a254ba3e3ced95eb5928884e33f1b99dca401
-
- 28 3月, 2023 2 次提交
-
-
由 Changyu Bi 提交于
Summary: …evel_bytes During DB open, if a column family uses level compaction with level_compaction_dynamic_level_bytes=true, trivially move its files down in the LSM such that the bottommost files are in Lmax, the second from bottommost level files are in Lmax-1 and so on. This is aimed to make it easier to migrate level_compaction_dynamic_level_bytes from false to true. Before this change, a full manual compaction is suggested for such migration. After this change, user can just restart DB to turn on this option. db_crashtest.py is updated to randomly choose value for level_compaction_dynamic_level_bytes. Note that there may still be too many unnecessary levels if a user is migrating from universal compaction or level compaction with a smaller level multiplier. A full manual compaction may still be needed in that case before some PR that automatically drain unnecessary levels like https://github.com/facebook/rocksdb/issues/3921 lands. Eventually we may want to change the default value of option level_compaction_dynamic_level_bytes to true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11321 Test Plan: 1. Added unit tests. 2. Crash test: ran a variation of db_crashtest.py (like 32516507e77521ae887e45091b69139e32e8efb7) that turns level_compaction_dynamic_level_bytes on and off and switches between LC and UC for the same DB. TODO: Update `OptionChangeMigration`, either after this PR or https://github.com/facebook/rocksdb/issues/3921. Reviewed By: ajkr Differential Revision: D44341930 Pulled By: cbi42 fbshipit-source-id: 013de19a915c6a0502be569f07c4cc8f1c3c6be2
-
由 karemta-orday 提交于
Summary: Hi, this is basically a part of https://github.com/facebook/rocksdb/pull/6488 that only adds `multi_get_for_update` functionality to C API (I'd like to call it from Rust), since `multi_get` was already added here https://github.com/facebook/rocksdb/pull/9252 https://github.com/facebook/rocksdb/pull/6488 has conflicts, so I guess it might be easier to get this one in Pull Request resolved: https://github.com/facebook/rocksdb/pull/11107 Reviewed By: pdillinger Differential Revision: D42680764 Pulled By: ajkr fbshipit-source-id: a50f96e1c7f3d470b4ab07e9ff5a283e5cf44865
-
- 23 3月, 2023 2 次提交
-
-
由 Andrew Kryczka 提交于
Summary: As titled. This is the same as https://github.com/facebook/rocksdb/issues/6788 but for range tombstones written through `SstFileWriter` rather than through `DB`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11322 Reviewed By: cbi42 Differential Revision: D44317733 Pulled By: ajkr fbshipit-source-id: f6eb8791ae2c09c169b6bfe0d047449d924b377e
-
由 Levi Tamasi 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11319 Reviewed By: cbi42 Differential Revision: D44308743 Pulled By: ltamasi fbshipit-source-id: ffd054e9f4162797cfe1ef78240ad2501f78bbbd
-