- 27 6月, 2023 2 次提交
-
-
由 akankshamahajan 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11565 Reviewed By: jowlyzhang, cbi42 Differential Revision: D47027788 Pulled By: akankshamahajan15 fbshipit-source-id: e5e8db2eb21f8aa68fe072f0e1b63b83ba7beb9f
-
由 akankshamahajan 提交于
Summary: When num_file_reads_for_auto_readahead = 1, during seek, it would go for prefetchingextra data in second buffer along with seek data, that would lead to increase in read data and discarded bytes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11560 Test Plan: Added unit test Reviewed By: anand1976 Differential Revision: D47008102 Pulled By: akankshamahajan15 fbshipit-source-id: 566c6131cb5f968d5efb81fd0ab233ff7e534ab0
-
- 24 6月, 2023 2 次提交
-
-
由 Changyu Bi 提交于
Summary: https://github.com/facebook/rocksdb/issues/11378 added a new overloaded `CreateColumnFamilyWithImport` API and updated the virtual function in `StackableDB` and `DBImplReadOnly` to the newly overloaded one. This caused internal error when there is a derived class that tries to override the original `CreateColumnFamilyWithImport` function. This PR adds the original `CreateColumnFamilyWithImport` function back to `StackableDB` and `DBImplReadOnly`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11556 Test Plan: check if this fixes an internal build Reviewed By: akankshamahajan15 Differential Revision: D46980506 Pulled By: cbi42 fbshipit-source-id: 975a6c5748bf9481499a62ee5997ca59e542e3bc
-
由 akankshamahajan 提交于
Add an interface to provide support for underlying FS to pass their own buffer during reads (#11324) Summary: 1. Public API change: Replace `use_async_io` API in file_system with `SupportedOps` API which is used by underlying FileSystem to indicate to upper layers whether the FileSystem supports different operations introduced in `enum FSSupportedOps `. Right now operations are `async_io` and whether FS will provide its own buffer during reads or not. The api is changed to extend it to various FileSystem operations in one API rather than creating a separate API for each operation. 2. Provide support for underlying FS to pass their own buffer during Reads (async and sync read) instead of using RocksDB provided `scratch` (buffer) in `FSReadRequest`. Currently only MultiRead supports it and later will be extended to other reads as well (point lookup, scan etc). More details in how to enable in file_system.h Pull Request resolved: https://github.com/facebook/rocksdb/pull/11324 Test Plan: Tested locally Reviewed By: anand1976 Differential Revision: D44465322 Pulled By: akankshamahajan15 fbshipit-source-id: 9ec9e08f839b5cc815e75d5dade6cd549998d0ec
-
- 23 6月, 2023 1 次提交
-
-
由 Yu Zhang 提交于
Summary: `table_properties_` is not guaranteed to be available. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11554 Reviewed By: akankshamahajan15 Differential Revision: D46944170 Pulled By: jowlyzhang fbshipit-source-id: 609d598e75b417471c9cd964cc316453776a2135
-
- 22 6月, 2023 3 次提交
-
-
由 Peter Dillinger 提交于
Summary: ... instead of race-condition-laden FaultInjectionTestEnv. See https://app.circleci.com/pipelines/github/facebook/rocksdb/27912/workflows/4c63e5a8-597e-439d-8c7e-82308056af02/jobs/609648 and similar PR https://github.com/facebook/rocksdb/issues/11271 Had to fix the semantics of FaultInjectionTestFS Close() operations to allow a non-OK Close() to fulfill the obligation to close before destruction. To me, this is the obvious choice of Close contract, because what is the caller supposed to do if Close() fails and they still have an obligation to successfully close before object destruction? Call Close() in an infinite loop? Leak the object? I have added API comments to the Env and Filesystem Close() functions to clarify the contracts. Note that `DB::Close()` has one exception to this kind of Close contract, but it is clearly described in API comments and it is really only for catching programming mistakes, not for dealing with exogenous errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11499 Test Plan: watch CI Reviewed By: jowlyzhang Differential Revision: D46375708 Pulled By: pdillinger fbshipit-source-id: 03d4d8251e5df50a82ecd139f7e83f613015fe40
-
由 Yu Zhang 提交于
Summary: Start to record the value of the flag `AdvancedColumnFamilyOptions.persist_user_defined_timestamps` in the Manifest and table properties for a SST file when it is created. And use the recorded flag when creating a table reader for the SST file. This flag's default value is true, it is only explicitly recorded if it's false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11515 Test Plan: ``` make all check ./version_edit_test ``` Reviewed By: ltamasi Differential Revision: D46920386 Pulled By: jowlyzhang fbshipit-source-id: 075c20363d3d2cc1368422ecc805617ed135cc26
-
由 Peter Dillinger 提交于
Summary: ... so that a non-cryptographic whole file checksum would be highly resistant to manipulation by a user able to manipulate key-value data (e.g. a user whose data is stored in RocksDB) and able to predict SST metadata such as DB session id and file number based on read access to logs or DB files. The adversary would also need to predict the salt in order to influence the checksum result toward collision with another file's checksum. This change is just internal code to support such a future feature. I think this should be a passive feature, not option-controlled, because you probably won't think about needing it until you discover you do need it, and it should be low cost, in space (16 bytes per SST file) and CPU. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11331 Test Plan: Unit tests added to verify at least pseudorandom behavior. (Actually caught a bug in first draft!) The new "stress" style tests run in ~3ms each on my system. Reviewed By: ajkr Differential Revision: D46129415 Pulled By: pdillinger fbshipit-source-id: 7972dc74487e062b29b1fd9c227425e922c98796
-
- 21 6月, 2023 1 次提交
-
-
由 Alexandre Lavigne 提交于
Summary: The class `NewCompactOnDeletionCollectorFactory` exposes the parameter `delete_ratio`. The C API `rocksdb_options_add_compact_on_deletion_collector_factory` does not allow a user to pass a delete ration to be passed down the the C++ class bellow. The class has default value for the delete ratio which makes it pass the compilation and the tests. closes https://github.com/facebook/rocksdb/issues/11541 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11542 Reviewed By: ajkr Differential Revision: D46770908 Pulled By: cbi42 fbshipit-source-id: 7b5162fe459896052e392e2d85a8f6c01db3b464
-
- 20 6月, 2023 2 次提交
-
-
由 Levi Tamasi 提交于
Summary: Calling `Flush` (even with `wait==true`) does not guarantee that obsolete WAL files are physically deleted before the call returns. The patch attempts to fix the resulting flakiness by using `SyncPoint`s to make sure `PurgeObsoleteFiles` finishes before checking for WAL deletions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11537 Test Plan: ``` gtest-parallel --repeat=1000 ./db_wal_test --gtest_filter="*SkipDeletedWALs*" ``` Reviewed By: pdillinger Differential Revision: D46736050 Pulled By: ltamasi fbshipit-source-id: 47a931b7a3a03ef681fbf4adb5a0b223d452703e
-
由 Levi Tamasi 提交于
Summary: `StressTest::optimistic_txn_db_` is currently not initialized by the constructor, which can lead to assertion failures down the line in `StressTest::Open`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11547 Reviewed By: cbi42 Differential Revision: D46845658 Pulled By: ltamasi fbshipit-source-id: 578b0f24fc00e3e97f24221fcdd003cc529439c2
-
- 18 6月, 2023 1 次提交
-
-
由 Jay Huh 提交于
Summary: Context: OptimisticTransactionDB has not been covered by db_stress (including crash test) like TransactionDB. 1. Adding the following gflag options to to test OptimisticTransactionDB - `use_optimistic_txn`: When true, open OptimisticTransactionDB to test - `occ_validation_policy`: `OccValidationPolicy::kValidateParallel = 1` by default. - `share_occ_lock_buckets`: Use shared occ locks - `occ_lock_bucket_count`: 500 by default. Number of buckets to use for shared occ lock. 2. Opening OptimisticTransactionDB and NewTxn/Commit added per `use_optimistic_txn` flag in `db_stress_test_base.cc` 3. OptimisticTransactionDB blackbox/whitebox test added in crash_test.mk Please note that the existing flag `use_txn` is being used here. When `use_txn == true` and `use_optimistic_txn == false`, we use `TransactionDB` (a.k.a. pessimistic transaction db). When both `use_txn` and `use_optimistic_txn` are true, we use `OptimisticTransactionDB`. If `use_txn == false` but `use_optimistic_txn == true` throw error with message _"You cannot set use_optimistic_txn true while use_txn is false. Please set use_txn true if you want to use OptimisticTransactionDB"_. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11513 Test Plan: **Crash Test** Serial Validation ``` export CRASH_TEST_EXT_ARGS="--use_optimistic_txn=1 --use_txn=1 --use_put_entity_one_in=0 --occ_validation_policy=0" make crash_test -j ``` Parallel Validation (no share bucket) ``` export CRASH_TEST_EXT_ARGS="--use_optimistic_txn=1 --use_txn=1 --use_put_entity_one_in=0 --occ_validation_policy=1 --share_occ_lock_buckets=0" make crash_test -j ``` Parallel Validation (share bucket) ``` export CRASH_TEST_EXT_ARGS="--use_optimistic_txn=1 --use_txn=1 --use_put_entity_one_in=0 --occ_validation_policy=1 --share_occ_lock_buckets=1 --occ_lock_bucket_count=500" make crash_test -j ``` **Stress Test** ``` ./db_stress -use_optimistic_txn -threads=32 ``` Reviewed By: pdillinger Differential Revision: D46547387 Pulled By: jaykorean fbshipit-source-id: ca19819ca6e0281694966998014b40d95d4e5960
-
- 17 6月, 2023 3 次提交
-
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** When db is upgrading to adopt [pr11406](https://github.com/facebook/rocksdb/pull/11406/), it's possible for RocksDB to infer a small tail size to prefetch for pre-upgrade files. Such small tail size would have caused 1 file read per index or filter partition if partitioned index or filer is used. This PR provides a UT to show this would not happen. Misc: refactor the related UTs a bit to make this new UT more readable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11522 Test Plan: - New UT If logic of upgrade is wrong e.g, ``` --- a/table/block_based/partitioned_index_reader.cc +++ b/table/block_based/partitioned_index_reader.cc @@ -166,7 +166,8 @@ Status PartitionIndexReader::CacheDependencies( uint64_t prefetch_len = last_off - prefetch_off; std::unique_ptr<FilePrefetchBuffer> prefetch_buffer; if (tail_prefetch_buffer == nullptr || !tail_prefetch_buffer->Enabled() || - tail_prefetch_buffer->GetPrefetchOffset() > prefetch_off) { + (false && tail_prefetch_buffer->GetPrefetchOffset() > prefetch_off)) { ``` , then the UT will fail like below ``` [ RUN ] PrefetchTailTest/PrefetchTailTest.UpgradeToTailSizeInManifest/0 file/prefetch_test.cc:461: Failure Expected: (db_open_file_read.count) < (num_index_partition), actual: 38 vs 33 Received signal 11 (Segmentation fault) ``` Reviewed By: pdillinger Differential Revision: D46546707 Pulled By: hx235 fbshipit-source-id: 9897b0a975e9055963edac5451fd1cd9d6c45d0e
-
由 Yu Zhang 提交于
Summary: Fix the error handling in `GetHostName` for non EFAULT, non EINVAL error. Current handling will cause stack overflow when non null-terminated c style string is in `name`, e.g. ENAMETOOLONG, when the `name` buffer is not big enough and the host name is truncated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11544 Test Plan: ``` COMPILE_WITH_ASAN=1 make all check ``` Reviewed By: pdillinger Differential Revision: D46775799 Pulled By: jowlyzhang fbshipit-source-id: e0fc9400c50fe38bc1fd888b4fea5fe8706165bf
-
由 Yu Zhang 提交于
Summary: This ticker combined with `rocksdb.files.marked.trash` can help give a better picture of how DeleteScheduler is keeping up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11540 Test Plan: ``` ./delete_scheduler_test ``` Reviewed By: ajkr Differential Revision: D46746401 Pulled By: jowlyzhang fbshipit-source-id: f3daa622aa3ddefe7d673e0cc257a47699d506df
-
- 16 6月, 2023 4 次提交
-
-
由 Changyu Bi 提交于
Summary: after https://github.com/facebook/rocksdb/issues/11321 and https://github.com/facebook/rocksdb/issues/11340 (both included in RocksDB v8.2), migration from `level_compaction_dynamic_level_bytes=false` to `level_compaction_dynamic_level_bytes=true` is automatic by RocksDB and requires no manual compaction from user. Making the option true by default as it has several advantages: 1. better space amplification guarantee (a more stable LSM shape). 2. compaction is more adaptive to write traffic. 3. automatic draining of unneeded levels. Wiki is updated with more detail: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#option-level_compaction_dynamic_level_bytes-and-levels-target-size. The PR mostly contains fixes for unit tests as they assumed `level_compaction_dynamic_level_bytes=false`. Most notable change is commit https://github.com/facebook/rocksdb/pull/11525/commits/f742be330ca1a7abc33107b00df99818f71c387b and https://github.com/facebook/rocksdb/pull/11525/commits/b1928e42b34c0d4f1cc5d5239149870c6dc7a737 which override the default option in DBTestBase to still set `level_compaction_dynamic_level_bytes=false` by default. This helps to reduce the change needed for unit tests. I think this default option override in unit tests is okay since the behavior of `level_compaction_dynamic_level_bytes=true` is tested by explicitly setting this option. Also, `level_compaction_dynamic_level_bytes=false` may be more desired in unit tests as it makes it easier to create a desired LSM shape. Comment for option `level_compaction_dynamic_level_bytes` is updated to reflect this change and change made in https://github.com/facebook/rocksdb/issues/10057. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11525 Test Plan: `make -j32 J=32 check` several times to try to catch flaky tests due to this option change. Reviewed By: ajkr Differential Revision: D46654256 Pulled By: cbi42 fbshipit-source-id: 6b5827dae124f6f1fdc8cca2ac6f6fcd878830e1
-
由 yaphet 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11534 Reviewed By: cbi42 Differential Revision: D46732248 Pulled By: ajkr fbshipit-source-id: c9360866c35a2c436ab83b85a14ed7bd43a88d95
-
由 shadowlux 提交于
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11536 Reviewed By: cbi42 Differential Revision: D46732297 Pulled By: ajkr fbshipit-source-id: 82bb078f189da233addc4f483eaa6eaf7fdd3910
-
由 mayue.fight 提交于
Summary: The original Feature Request is from [https://github.com/facebook/rocksdb/issues/11317](https://github.com/facebook/rocksdb/issues/11317). Flink uses rocksdb as the state backend, all DB options are the same, and the keys of each DB instance are adjacent and there is no key overlap between two db instances. In the Flink rescaling scenario, it is necessary to quickly split the DB according to a certain key range or quickly merge multiple DBs into one. This PR is mainly used to quickly merge multiple DBs into one. We hope to extend the function of `CreateColumnFamilyWithImports` to support creating ColumnFamily by importing multiple ColumnFamily with no overlapping keys. The import logic is almost the same as `CreateColumnFamilyWithImport`, but it will check whether there is key overlap between CF when importing. The import will fail if there are key overlaps. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11378 Reviewed By: ajkr Differential Revision: D46413709 Pulled By: cbi42 fbshipit-source-id: 846d0049fad11c59cf460fa846c345b26c658dfb
-
- 15 6月, 2023 2 次提交
-
-
由 Peter Dillinger 提交于
Summary: Use another static object to join threads instead. This change is motivated by a case in which some code using NewLRUCache() -> ShardedCacheBase -> SemiStructuredUniqueIdGen -> GenerateRawUniqueId() -> Env::Default() was happening during static destruction. I didn't see anything else in PosixEnv or base classes that would cause a problem by not destroying. (WinEnv is already not destroyed; see env_default.cc) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11538UndefinedBehaviorSanitizer: undefined-behavior env/env_test.cc:3561:23 in $ ``` Test Plan: test added, which would previously fail with UBSAN: ``` $ ./env_test --gtest_filter=*Destruct* Note: Google Test filter = *Destruct* [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from EnvTestMisc [ RUN ] EnvTestMisc.StaticDestruction [ OK ] EnvTestMisc.StaticDestruction (0 ms) [----------] 1 test from EnvTestMisc (0 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (0 ms total) [ PASSED ] 1 test. env/env_test.cc:3561:23: runtime error: member call on address 0x7f7b96671ca8 which does not point to an object of type 'rocksdb::Env' 0x7f7b96671ca8: note: object is of type 'N7rocksdb12ConfigurableE' 00 00 00 00 90 a7 f7 95 7b 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'N7rocksdb12ConfigurableE' Reviewed By: jowlyzhang Differential Revision: D46737389 Pulled By: pdillinger fbshipit-source-id: 0f80a443bf799ffc5641e898cf3a75f7d10a987b
-
由 Changyu Bi 提交于
Summary: when a DB is configured with `allow_ingest_behind = true`, the last level should be reserved for ingested files and these files should not be included in any compaction. Currently, a major compaction can compact these files to smaller levels. This can cause future files to be rejected for ingest behind (see `ExternalSstFileIngestionJob::CheckLevelForIngestedBehindFile()`). This PR fixes the issue such that files in the last level is not included in any compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11489 Test Plan: * Updated unit test `ExternalSSTFileTest.IngestBehind` to test that last level is not included in manual and auto-compaction. Reviewed By: ajkr Differential Revision: D46455711 Pulled By: cbi42 fbshipit-source-id: 5e2142c2a709ef932ad797897795021c06c4ac8c
-
- 14 6月, 2023 3 次提交
-
-
由 Andrew Kryczka 提交于
Summary: See "unreleased_history/new_features/obsolete_sst_files_size.md" for description Pull Request resolved: https://github.com/facebook/rocksdb/pull/11533 Test Plan: updated unit test Reviewed By: jowlyzhang Differential Revision: D46703152 Pulled By: ajkr fbshipit-source-id: ea5e31cd6293eccc154130c13e66b5271f57c102
-
由 Changyu Bi 提交于
Summary: the first CI step "Check buck targets and code format..." is failing with the following error message: ``` Download action repository 'wei/wget@v1' (SHA:c15e476d1463f4936cb54f882170d9d631f1aba5) Error: An action could not be found at the URI 'https://api.github.com/repos/wei/wget/tarball/c15e476d1463f4936cb54f882170d9d631f1aba5' ``` Not sure why the action is lost, but it seems we can use wget directly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11532 Test Plan: watch CI job "Check buck targets and code format" passes Reviewed By: ltamasi Differential Revision: D46700626 Pulled By: cbi42 fbshipit-source-id: 53c09f27965444b533b3fe3755aec922571bba2c
-
由 Hui Xiao 提交于
Summary: **Context/Summary:** as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/11530 Test Plan: no code change Reviewed By: jowlyzhang Differential Revision: D46655670 Pulled By: hx235 fbshipit-source-id: 3dfea5485a9a6a8ce0de1f6296c4cf31aedfa116
-
- 13 6月, 2023 1 次提交
-
-
由 Yu Zhang 提交于
Summary: A user find and reported this in issue https://github.com/facebook/rocksdb/issues/11528 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11529 Reviewed By: hx235 Differential Revision: D46653618 Pulled By: jowlyzhang fbshipit-source-id: aa98f59028daaaa5b0bd3b96ee87dcdd7369a8b9
-
- 10 6月, 2023 1 次提交
-
-
由 Ignat Loskutov 提交于
Summary: Add new tickers: `rocksdb.error.handler.bg.error.count`, `rocksdb.error.handler.bg.io.error.count`, `rocksdb.error.handler.bg.retryable.io.error.count` to replace the misspelled ones: `rocksdb.error.handler.bg.errro.count`, `rocksdb.error.handler.bg.io.errro.count`, `rocksdb.error.handler.bg.retryable.io.errro.count` ('error' instead of 'errro'). Users should switch to use the new tickers before 9.0 release as the misspelled old tickers will be completely removed then. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11509 Reviewed By: ltamasi Differential Revision: D46542809 Pulled By: jowlyzhang fbshipit-source-id: a2a6d8354af46a060de81d40ef6f5336a80bd32e
-
- 09 6月, 2023 2 次提交
-
-
由 Ignat Loskutov 提交于
Summary: The `rocksdb.db.get.micros` histogram is never updated if the DB is open in ReadOnly mode, as well as the `get_cpu_nanos` perf counter. An earlier PR (https://github.com/facebook/rocksdb/issues/4260) for some reason has only added the TODO line, not the accounting itself, so this one is intended to fix it, adding two lines to match [DBImplSecondary](https://github.com/facebook/rocksdb/blob/4dafa5b220c439b7f05eb598a2ab32b3152bedb5/db/db_impl/db_impl_secondary.cc#L366-L367). Pull Request resolved: https://github.com/facebook/rocksdb/pull/11521 Reviewed By: cbi42 Differential Revision: D46577330 Pulled By: jowlyzhang fbshipit-source-id: be147923e763af32bbc18fd6bdf3aff8ebf08aee
-
由 Yu Zhang 提交于
Summary: Fix a use-after-move issue in block.cc and added some unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11505 Test Plan: ``` make all check ./block_test ``` Reviewed By: ltamasi Differential Revision: D46506188 Pulled By: jowlyzhang fbshipit-source-id: 316ed8ddd221c00b2bce2cf9fd47eea686cd74a5
-
- 08 6月, 2023 1 次提交
-
-
由 Peter Dillinger 提交于
Summary: Support was added in 8.1.0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11517 Test Plan: comments only Reviewed By: anand1976 Differential Revision: D46489929 Pulled By: pdillinger fbshipit-source-id: 4fd30078389065c9ec225bf55b6773f1641f0646
-
- 07 6月, 2023 3 次提交
-
-
由 Hui Xiao 提交于
Summary: **Context:** [PR11406](https://github.com/facebook/rocksdb/pull/11406/) caused more frequent read during db open reading files with no `tail_size` in the manifest as part of the upgrade to 11406. This is due to that PR introduced - [smaller](https://github.com/facebook/rocksdb/pull/11406/files#diff-57ed8c49db2bdd4db7618646a177397674bbf25beacacecb104070071d30129fR833) prefetch tail buffer size compared to pre-11406 for small files (< 52 MB) when `tail_prefetch_stats` infers tail size to be 0 (usually happens when the stats does not have much historical data to infer early on) - more read (up to # of partitioned filter/index) when such small prefetch tail buffer does not contain all the partitioned filter/index needed in CacheDependencies() since the [fallback logic](https://github.com/facebook/rocksdb/pull/11406/files#diff-d98f1a83de24412ad7f3527725dae7e28851c7222622c3cdb832d3cdf24bbf9fR165-R179) that prefetches all partitions at once will be [skipped](url) when such a small prefetch tail buffer is passed in **Summary:** - Revert the fallback prefetch buffer size change to preserve existing behavior fully during upgrading in `BlockBasedTable::PrefetchTail()` - Use passed-in prefetch tail buffer in `CacheDependencies()` only if it has a smaller offset than the the offset of first partition filter/index, that is, at least as good as the existing prefetching behavior Pull Request resolved: https://github.com/facebook/rocksdb/pull/11516 Test Plan: - db bench Create db with small files prior to PR 11406 ``` ./db_bench -db=/tmp/testdb/ --partition_index_and_filters=1 --statistics=1 -benchmarks=fillseq -key_size=3200 -value_size=5 -num=1000000 -write_buffer_size=6550000 -disable_auto_compactions=true -compression_type=zstd` ``` Read db to see if post-pr has lower read qps (i.e, rocksdb.file.read.db.open.micros count) during db open. ``` ./db_bench -use_direct_reads=1 --file_opening_threads=1 --threads=1 --use_existing_db=1 --seed=1682546046158958 --partition_index_and_filters=1 --statistics=1 --db=/tmp/testdb/ --benchmarks=readrandom --key_size=3200 --value_size=5 --num=100 --disable_auto_compactions=true --compression_type=zstd ``` Pre-PR: ``` rocksdb.file.read.db.open.micros P50 : 3.399023 P95 : 5.924468 P99 : 12.408333 P100 : 29.000000 COUNT : 611 SUM : 2539 ``` Post-PR: ``` rocksdb.file.read.db.open.micros P50 : 593.736842 P95 : 861.605263 P99 : 1212.868421 P100 : 2663.000000 COUNT : 585 SUM : 345349 ``` _Note: To control the starting offset of the prefetch tail buffer easier, I manually override the following to eliminate the effect of alignment_ ``` class PosixRandomAccessFile : public FSRandomAccessFile { virtual size_t GetRequiredBufferAlignment() const override { - return logical_sector_size_; + return 1; } ``` - CI Reviewed By: pdillinger Differential Revision: D46472566 Pulled By: hx235 fbshipit-source-id: 2fe14ac8d489d15b0e08e6f8fe4f46d5f110978e
-
由 Changyu Bi 提交于
Summary: as reported in https://github.com/facebook/rocksdb/issues/11476, RocksDB currently does not execute compactions in two subcompactions even when they qualify. This PR fixes this issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11501 Test Plan: * Add a new unit test. * Run crash test with max_subcompactions=2: `python3 tools/db_crashtest.py blackbox --simple --subcompactions=2 --target_file_size_base=1048576 --compaction_style=0` * saw logs showing compactions being executed as 2 subcompactions ``` 2023/06/01-17:28:44.028470 3147486 (Original Log Time 2023/06/01-17:28:44.025972) EVENT_LOG_v1 {"time_micros": 1685665724025939, "job": 6, "event": "compaction_finished", "compaction_time_micros": 34539, "compaction_time_cpu_micros": 26404, "output_level": 6, "num_output_files": 2, "total_output_size": 1109796, "num_input_records": 13188, "num_output_records": 13021, "num_subcompactions": 2, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 0, 0, 0, 0, 0, 13]} ``` Reviewed By: ajkr Differential Revision: D46411497 Pulled By: cbi42 fbshipit-source-id: 3ebfc02e19f78f782e114a9546dc3d481d496258
-
由 leipeng 提交于
Summary: This PR utilize padding space. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10633 Reviewed By: pdillinger Differential Revision: D46410978 Pulled By: ajkr fbshipit-source-id: 23ec757b1eea9221c1390971e39d341c6b7f2003
-
- 06 6月, 2023 3 次提交
-
-
由 Changyu Bi 提交于
Summary: Fix the test added in https://github.com/facebook/rocksdb/issues/11459 that is failing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11512 Test Plan: `./db_range_del_test --gtest_filter="*NonBottommostCompactionDropRangetombstone"` Reviewed By: pdillinger Differential Revision: D46451450 Pulled By: cbi42 fbshipit-source-id: bcad20b8fd21c4f71924cec6cb045ee4b2038b90
-
由 Yu Zhang 提交于
Summary: Switch from std::unordered_map to RocksDB UnorderedMap for all the places that logging user-defined timestamp size in WAL used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11507 Test Plan: ``` make all check ``` Reviewed By: ltamasi Differential Revision: D46448975 Pulled By: jowlyzhang fbshipit-source-id: bdb4d56a723b697a33daaf0f856a61d49a367a99
-
由 Changyu Bi 提交于
Summary: Similar to point tombstones, we can drop a range tombstone during compaction when we know its range does not exist in any higher level. This PR adds this optimization. Some existing test in db_range_del_test is fixed to work under this optimization. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11459 Test Plan: * Add unit test `DBRangeDelTest, NonBottommostCompactionDropRangetombstone`. * Ran crash test that issues range deletion for a few hours: `python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --delrangepercent=10 --writepercent=31 --readpercent=40` Reviewed By: ajkr Differential Revision: D46007904 Pulled By: cbi42 fbshipit-source-id: 3f37205b6778b7d55ed106369ca41b0632a6d0fd
-
- 03 6月, 2023 2 次提交
-
-
由 Andrew Kryczka 提交于
Summary: Follow a couple best practices: - Allowed Google benchmark to decide number of iterations. Previously we hardcoded a value, which circumvented benchmark's heuristic for iterating until the result is stable. - Made each iteration do similar work. Previously, an iteration could do different work depending if the key was found in the first, second, third, or no L0 file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11498 Test Plan: none as I am unable to prove it is better Reviewed By: hx235 Differential Revision: D46339050 Pulled By: ajkr fbshipit-source-id: fcfc6da4111c5b3ae86d79d908afc5f61f96675b
-
由 Peter Dillinger 提交于
Summary: * Add a "Performance Improvements" section * Add required copyright headers Pull Request resolved: https://github.com/facebook/rocksdb/pull/11504 Test Plan: manual Reviewed By: hx235 Differential Revision: D46405128 Pulled By: pdillinger fbshipit-source-id: 4f878dfd0170d381d3051a44c13479c860e812c0
-
- 02 6月, 2023 3 次提交
-
-
由 Changyu Bi 提交于
Summary: currently 0 is incorrectly logged as the compaction score for L0 when num_levels > 1. This PR fixes the issue to log the correct score. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11487 Test Plan: ``` ./db_bench --benchmarks=fillrandom --max_background_jobs=8 --num=1000000 --compaction_style=1 --stats_dump_period_sec=20 --num_levels=7 --write_buffer_size=1048576 grep "L0 " /tmp/rocksdbtest-543376/dbbench/LOG before: ** Compaction Stats [default] ** Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 9.9 0.42 0.33 9 0.046 0 0 0.0 0.0 L0 3/1 1.37 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0.6 9.6 3.76 3.03 76 0.050 34K 140 0.0 0.0 L0 2/0 2.26 MB 0.0 0.0 0.0 0.0 0.1 0.1 0.0 1.6 3.2 8.2 12.59 11.17 163 0.077 619K 5499 0.0 0.0 after: compaction scores are non-zero L0 0/0 0.00 KB 0.8 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 9.6 0.43 0.34 9 0.048 0 0 0.0 0.0 L0 2/1 937.08 KB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0.6 9.3 3.85 3.07 75 0.051 34K 165 0.0 0.0 L0 2/2 1.82 MB 1.0 0.0 0.0 0.0 0.1 0.1 0.0 1.6 3.0 8.0 12.45 10.99 160 0.078 577K 5399 0.0 0.0 ``` Reviewed By: ajkr Differential Revision: D46293993 Pulled By: cbi42 fbshipit-source-id: 19753f7df68c5f54a84c4ed52794f83e510c9721
-
由 Changyu Bi 提交于
Summary: currently for leveled compaction, the max output level of a call to `CompactRange()` is pre-computed before compacting each level. This max output level is the max level whose key range overlaps with the manual compaction key range. However, during manual compaction, files in the max output level may be compacted down further by some background compaction. When this background compaction is a trivial move, there is a race condition and the manual compaction may not be able to compact all keys in the specified key range. This PR updates `CompactRange()` to always compact to the bottommost level to make this race condition more unlikely (it can still happen, see more in comment here: https://github.com/cbi42/rocksdb/blob/796f58f42ad1bdbf49e5fcf480763f11583b790e/db/db_impl/db_impl_compaction_flush.cc#L1180C29-L1184). This PR also changes the behavior of CompactRange() when `bottommost_level_compaction=kIfHaveCompactionFilter` (the default option). The old behavior is that, if a compaction filter is provided, CompactRange() always does an intra-level compaction at the final output level for all files in the manual compaction key range. The only exception when `first_overlapped_level = 0` and `max_overlapped_level = 0`. It’s awkward to maintain the same behavior after this PR since we do not compute max_overlapped_level anymore. So the new behavior is similar to kForceOptimized: always does intra-level compaction at the bottommost level, but not including new files generated during this manual compaction. Several unit tests are updated to work with this new manual compaction behavior. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11468 Test Plan: Add new unit tests `DBCompactionTest.ManualCompactionCompactAllKeysInRange*` Reviewed By: ajkr Differential Revision: D46079619 Pulled By: cbi42 fbshipit-source-id: 19d844ba4ec8dc1a0b8af5d2f36ff15820c6e76f
-
由 Yu Zhang 提交于
Summary: Add support to strip timestamp in block based table builder and pad timestamp in block based table reader. On the write path, use the per column family option `AdvancedColumnFamilyOptions.persist_user_defined_timestamps` to indicate whether user-defined timestamps should be stripped for all block based tables created for the column family. On the read path, added a per table `TableReadOption.user_defined_timestamps_persisted` to flag whether the user keys in the table contains user defined timestamps. This patch is mostly passing the related flags down to the block building/parsing level with the exception of handling the `first_internal_key` in `IndexValue`, which is included in the `IndexBuilder` level. The value part of range deletion entries should have a similar handling, I haven't decided where to best fit this piece of logic, I will do it in a follow up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11495 Test Plan: Existing test `BlockBasedTableReaderTest` is parameterized to run with: 1) different UDT test modes: kNone, kNormal, kStripUserDefinedTimestamp 2) all four index types, when index type is `kTwoLevelIndexSearch`, also enables partitioned filters 3) parallel vs non-parallel compression 4) enable/disable compression dictionary. Also added tests for API `BlockBasedTableReader::NewIterator`. `PartitionedFilterBlockTest` is parameterized to run with different UDT test modes:kNone, kNormal, kStripUserDefinedTimestamp. ``` make all check ./block_based_table_reader_test ./partitioned_filter_block_test ``` Reviewed By: ltamasi Differential Revision: D46344577 Pulled By: jowlyzhang fbshipit-source-id: 93ac8542b19319d1298712b8bed908c8831ba675
-