1. 11 10月, 2022 3 次提交
  2. 08 10月, 2022 4 次提交
    • J
      Add option `preserve_internal_time_seconds` to preserve the time info (#10747) · c401f285
      Jay Zhuang 提交于
      Summary:
      Add option `preserve_internal_time_seconds` to preserve the internal
      time information.
      It's mostly for the migration of the existing data to tiered storage (
      `preclude_last_level_data_seconds`). When the tiering feature is just
      enabled, the existing data won't have the time information to decide if
      it's hot or cold. Enabling this feature will start collect and preserve
      the time information for the new data.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10747
      
      Reviewed By: siying
      
      Differential Revision: D39910141
      
      Pulled By: siying
      
      fbshipit-source-id: 25c21638e37b1a7c44006f636b7d714fe7242138
      c401f285
    • A
      Blog post for asynchronous IO (#10789) · f366f90b
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10789
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D40198988
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 5db74f12dd8854f6288fbbf8775c8e759778c307
      f366f90b
    • Y
      Exclude timestamp when checking compaction boundaries (#10787) · 11943e8b
      Yanqin Jin 提交于
      Summary:
      When checking if a range [start, end) overlaps with a compaction whose range is [start1, end1), always exclude timestamp from start, end, start1 and end1, otherwise some versions of one user key may be compacted to bottommost layer while others remain in the original level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10787
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D40187672
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 81226267fd3e33ffa79665c62abadf2ebec45496
      11943e8b
    • L
      Verify wide columns during prefix scan in stress tests (#10786) · 7af47c53
      Levi Tamasi 提交于
      Summary:
      The patch adds checks to the
      `{NonBatchedOps,BatchedOps,CfConsistency}StressTest::TestPrefixScan` methods
      to make sure the wide columns exposed by the iterators are as expected (based on
      the value base encoded into the iterator value). It also makes some code hygiene
      improvements in these methods.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10786
      
      Test Plan:
      Ran some simple blackbox tests in the various modes (non-batched, batched,
      CF consistency).
      
      Reviewed By: riversand963
      
      Differential Revision: D40163623
      
      Pulled By: riversand963
      
      fbshipit-source-id: 72f4c3b51063e48c15f974c4ec64d751d3ed0a83
      7af47c53
  3. 07 10月, 2022 4 次提交
    • Y
      Expand stress test coverage for min_write_buffer_number_to_merge (#10785) · 943247b7
      Yanqin Jin 提交于
      Summary:
      As title.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10785
      
      Test Plan: CI
      
      Reviewed By: ltamasi
      
      Differential Revision: D40162583
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 4e01f9b682f397130e286cf5d82190b7973fa3c1
      943247b7
    • J
      Use `sstableKeyCompare()` for compaction output boundary check (#10763) · 23fa5b77
      Jay Zhuang 提交于
      Summary:
      To make it consistent with the compaction picker which uses the `sstableKeyCompare()` to pick the overlap files. For example, without this change, it may cut L1 files like:
      ```
       L1: [2-21]  [22-30]
       L2: [1-10] [21-30]
      ```
      Because "21" on L1 is smaller than "21" on L2. But for compaction, these 2 files are overlapped.
      `sstableKeyCompare()` also take range delete into consideration which may cut file for the same key.
      It also makes the `max_compaction_bytes` calculation more accurate for cases like above, the overlapped bytes was under estimated. Also make sure the 2 keys won't be splitted to 2 files because of reaching `max_compaction_bytes`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10763
      
      Reviewed By: cbi42
      
      Differential Revision: D39971904
      
      Pulled By: cbi42
      
      fbshipit-source-id: bcc309e9c3dc61a8f50667a6f633e6132c0154a8
      23fa5b77
    • L
      Verify columns in NonBatchedOpsStressTest::VerifyDb (#10783) · d6d8c007
      Levi Tamasi 提交于
      Summary:
      As the first step of covering the wide-column functionality of iterators
      in our stress tests, the patch adds verification logic to
      `NonBatchedOpsStressTest::VerifyDb` that checks whether the
      iterator's value and columns are in sync. Note: I plan to update the other
      types of stress tests and add similar verification for prefix scans etc.
      in separate PRs.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10783
      
      Test Plan: Ran some simple blackbox crash tests.
      
      Reviewed By: riversand963
      
      Differential Revision: D40152370
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8f9d17d7af5da58ccf1bd2057cab53cc9645ac35
      d6d8c007
    • P
      Fix bug in HyperClockCache ApplyToEntries; cleanup (#10768) · b205c6d0
      Peter Dillinger 提交于
      Summary:
      We have seen some rare crash test failures in HyperClockCache, and the source could certainly be a bug fixed in this change, in ClockHandleTable::ConstApplyToEntriesRange. It wasn't properly accounting for the fact that incrementing the acquire counter could be ineffective, due to parallel updates. (When incrementing the acquire counter is ineffective, it is incorrect to then decrement it.)
      
      This change includes some other minor clean-up in HyperClockCache, and adds stats_dump_period_sec with a much lower period to the crash test. This should be the primary caller of ApplyToEntries, in collecting cache entry stats.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10768
      
      Test Plan: haven't been able to reproduce the failure, but should be in a better state (bug fix and improved crash test)
      
      Reviewed By: anand1976
      
      Differential Revision: D40034747
      
      Pulled By: anand1976
      
      fbshipit-source-id: a06fcefe146e17ee35001984445cedcf3b63eb68
      b205c6d0
  4. 06 10月, 2022 3 次提交
  5. 05 10月, 2022 5 次提交
  6. 04 10月, 2022 4 次提交
    • P
      Some clean-up of secondary cache (#10730) · 5f4391dd
      Peter Dillinger 提交于
      Summary:
      This is intended as a step toward possibly separating secondary cache integration from the
      Cache implementation as much as possible, to (hopefully) minimize code duplication in
      adding secondary cache support to HyperClockCache.
      * Major clarifications to API docs of secondary cache compatible parts of Cache. For example, previously the docs seemed to suggest that Wait() was not needed if IsReady()==true. And it wasn't clear what operations were actually supported on pending handles.
      * Add some assertions related to these requirements, such as that we don't Release() before Wait() (which would leak a secondary cache handle).
      * Fix a leaky abstraction with dummy handles, which are supposed to be internal to the Cache. Previously, these just used value=nullptr to indicate dummy handle, which meant that they could be confused with legitimate value=nullptr cases like cache reservations. Also fixed blob_source_test which was relying on this leaky abstraction.
      * Drop "incomplete" terminology, which was another name for "pending".
      * Split handle flags into "mutable" ones requiring mutex and "immutable" ones which do not. Because of single-threaded access to pending handles, the "Is Pending" flag can be in the "immutable" set. This allows removal of a TSAN work-around and removing a mutex acquire-release in IsReady().
      * Remove some unnecessary handling of charges on handles of failed lookups. Keeping total_charge=0 means no special handling needed. (Removed one unnecessary mutex acquire/release.)
      * Simplify handling of dummy handle in Lookup(). There is no need to explicitly Ref & Release w/Erase if we generally overwrite the dummy anyway. (Removed one mutex acquire/release, a call to Release().)
      
      Intended follow-up:
      * Clarify APIs in secondary_cache.h
        * Doesn't SecondaryCacheResultHandle transfer ownership of the Value() on success (implementations should not release the value in destructor)?
        * Does Wait() need to be called if IsReady() == true? (This would be different from Cache.)
        * Do Value() and Size() have undefined behavior if IsReady() == false?
        * Why have a custom API for what is essentially a std::future<std::pair<void*, size_t>>?
      * Improve unit testing of standalone handle case
      * Apparent null `e` bug in `free_standalone_handle` case
      * Clean up secondary cache testing in lru_cache_test
        * Why does TestSecondaryCacheResultHandle hold on to a Cache::Handle?
        * Why does TestSecondaryCacheResultHandle::Wait() do nothing? Shouldn't it establish the post-condition IsReady() == true?
        * (Assuming that is sorted out...) Shouldn't TestSecondaryCache::WaitAll simply wait on each handle in order (no casting required)? How about making that the default implementation?
        * Why does TestSecondaryCacheResultHandle::Size() check Value() first? If the API is intended to be returning 0 before IsReady(), then that is weird but should at least be documented. Otherwise, if it's intended to be undefined behavior, we should assert IsReady().
      * Consider replacing "standalone" and "dummy" entries with a single kind of "weak" entry that deletes its value when it reaches zero refs. Suppose you are using compressed secondary cache and have two iterators at similar places. It will probably common for one iterator to have standalone results pinned (out of cache) when the second iterator needs those same blocks and has to re-load them from secondary cache and duplicate the memory. Combining the dummy and the standalone should fix this.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10730
      
      Test Plan:
      existing tests (minor update), and crash test with sanitizers and secondary cache
      
      Performance test for any regressions in LRUCache (primary only):
      Create DB with
      ```
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16
      ```
      Test before & after (run at same time) with
      ```
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X100] -readonly -num=30000000 -bloom_bits=16 -cache_index_and_filter_blocks=1 -cache_size=233000000 -duration 30 -threads=16
      ```
      Before: readrandom [AVG    100 runs] : 22234 (± 63) ops/sec;    1.6 (± 0.0) MB/sec
      After: readrandom [AVG    100 runs] : 22197 (± 64) ops/sec;    1.6 (± 0.0) MB/sec
      That's within 0.2%, which is not significant by the confidence intervals.
      
      Reviewed By: anand1976
      
      Differential Revision: D39826010
      
      Pulled By: anand1976
      
      fbshipit-source-id: 3202b4a91f673231c97648ae070e502ae16b0f44
      5f4391dd
    • L
      Disable ingestion in stress tests when PutEntity is used (#10769) · 3ae00dec
      Levi Tamasi 提交于
      Summary:
      `SstFileWriter` currently does not support the `PutEntity` API, so in `TestIngestExternalFile` all key-values are written using regular `Put`s. This violates the assumption that whether or not a key corresponds to a plain old key-value or a wide-column entity can be determined by solely looking at the "value base" used when generating the value. The patch fixes this issue by disabling ingestion when `PutEntity` is enabled in the stress tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10769
      
      Test Plan: Ran a simple blackbox stress test.
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D40042132
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 93e75ff55545b7b69fa4ddef1d96093c961158a0
      3ae00dec
    • C
      Add iterator refresh to stress test (#10766) · 8b430e01
      Changyu Bi 提交于
      Summary:
      added calls to `Iterator::Refresh()` in `NonBatchedOpsStressTest::TestIterateAgainstExpected()`. The testing key range is locked in `TestIterateAgainstExpected` so I do not expect this change to provide thorough stress test to `Iterator::Refresh()`. However, it can still be helpful for catching bugs like https://github.com/facebook/rocksdb/issues/10739. Will add calls to refresh in `TestIterate` once we support iterator refresh with snapshots.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10766
      
      Test Plan: `python3 tools/db_crashtest.py whitebox --simple --verify_iterator_with_expected_state_one_in=2`
      
      Reviewed By: ajkr
      
      Differential Revision: D40008320
      
      Pulled By: ajkr
      
      fbshipit-source-id: cec93b07f915ef6476d41c1fee9b23c115188085
      8b430e01
    • A
      Add new property in IOOptions to skip recursing through directories and list... · ae0f9c33
      akankshamahajan 提交于
      Add new property in IOOptions to skip recursing through directories and list only files during GetChildren. (#10668)
      
      Summary:
      Add new property "do_not_recurse" in  IOOptions for underlying file system to skip iteration of directories during DB::Open if there are no sub directories and list only files.
      By default this property is set to false. This property is set true currently in the code where RocksDB is sure only files are needed during DB::Open.
      
      Provided support in PosixFileSystem to use "do_not_recurse".
      
      TestPlan:
      - Existing tests
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10668
      
      Reviewed By: anand1976
      
      Differential Revision: D39471683
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 90e32f0b86d5346d53bc2714d3a0e7002590527f
      ae0f9c33
  7. 01 10月, 2022 5 次提交
    • C
      User-defined timestamp support for `DeleteRange()` (#10661) · 9f2363f4
      Changyu Bi 提交于
      Summary:
      Add user-defined timestamp support for range deletion. The new API is `DeleteRange(opt, cf, begin_key, end_key, ts)`. Most of the change is to update the comparator to compare without timestamp. Other than that, major changes are
      - internal range tombstone data structures (`FragmentedRangeTombstoneList`, `RangeTombstone`, etc.) to store timestamps.
      - Garbage collection of range tombstones and range tombstone covered keys during compaction.
      - Get()/MultiGet() to return the timestamp of a range tombstone when needed.
      - Get/Iterator with range tombstones bounded by readoptions.timestamp.
      - timestamp crash test now issues DeleteRange by default.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10661
      
      Test Plan:
      - Added unit test: `make check`
      - Stress test: `python3 tools/db_crashtest.py --enable_ts whitebox --readpercent=57 --prefixpercent=4 --writepercent=25 -delpercent=5 --iterpercent=5 --delrangepercent=4`
      - Ran `db_bench` to measure regression when timestamp is not enabled. The tests are for write (with some range deletion) and iterate with DB fitting in memory: `./db_bench--benchmarks=fillrandom,seekrandom --writes_per_range_tombstone=200 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=500000 --seek_nexts=10 --disable_auto_compactions -disable_wal=true --max_num_range_tombstones=1000`.  Did not see consistent regression in no timestamp case.
      
      | micros/op | fillrandom | seekrandom |
      | --- | --- | --- |
      |main| 2.58 |10.96|
      |PR 10661| 2.68 |10.63|
      
      Reviewed By: riversand963
      
      Differential Revision: D39441192
      
      Pulled By: cbi42
      
      fbshipit-source-id: f05aca3c41605caf110daf0ff405919f300ddec2
      9f2363f4
    • H
      Add manual_wal_flush, FlushWAL() to stress/crash test (#10698) · 3b816491
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      Introduce `manual_wal_flush_one_in` as titled.
      - When `manual_wal_flush_one_in  > 0`, we also need tracing to correctly verify recovery because WAL data can be lost in this case when `FlushWAL()` is not explicitly called by users of RocksDB (in our case, db stress) and the recovery from such potential WAL data loss is a prefix recovery that requires tracing to verify. As another consequence, we need to disable features can't run under unsync data loss with `manual_wal_flush_one_in`
      
      Incompatibilities fixed along the way:
      ```
      db_stress: db/db_impl/db_impl_open.cc:2063: static rocksdb::Status rocksdb::DBImpl::Open(const rocksdb::DBOptions&, const string&, const std::vector<rocksdb::ColumnFamilyDescriptor>&, std::vector<rocksdb::ColumnFamilyHandle*>*, rocksdb::DB**, bool, bool): Assertion `impl->TEST_WALBufferIsEmpty()' failed.
      ```
       - It turns out that `Writer::AddCompressionTypeRecord` before this assertion `EmitPhysicalRecord(kSetCompressionType, encode.data(), encode.size());` but do not trigger flush if `manual_wal_flush` is set . This leads to `impl->TEST_WALBufferIsEmpty()' is false.
          - As suggested, assertion is removed and violation case is handled by `FlushWAL(sync=true)` along with refactoring `TEST_WALBufferIsEmpty()` to be `WALBufferIsEmpty()` since it is used in prod code now.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10698
      
      Test Plan:
      - Locally running `python3 tools/db_crashtest.py blackbox --manual_wal_flush_one_in=1 --manual_wal_flush=1 --sync_wal_one_in=100 --atomic_flush=1 --flush_one_in=100 --column_families=3`
      - Joined https://github.com/facebook/rocksdb/pull/10624 in auto CI testings with all RocksDB stress/crash test jobs
      
      Reviewed By: ajkr
      
      Differential Revision: D39593752
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3a2135bb792c52d2ffa60257d4fbc557fb04d2ce
      3b816491
    • A
      Track expected state only if expected values dir is non-empty (#10764) · 793fd097
      anand76 提交于
      Summary:
      If the `-expected_values_dir` argument to db_stress is empty, then verification against expected state is effectively disabled. But `RunStressTest` still calls `TrackExpectedState`, which returns `NotSupported` causing a the crash test to fail with a false alarm. Fix it by only calling `TrackExpectedState` if necessary.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10764
      
      Reviewed By: ajkr
      
      Differential Revision: D39980129
      
      Pulled By: anand1976
      
      fbshipit-source-id: d02651746fe3a297877a4b2b2fbcb7274860f49c
      793fd097
    • L
      Add the PutEntity API to the stress/crash tests (#10760) · 9078fccc
      Levi Tamasi 提交于
      Summary:
      The patch adds the `PutEntity` API to the non-batched, batched, and
      CF consistency stress tests. Namely, when the new `db_stress` command
      line parameter `use_put_entity_one_in` is greater than zero, one in
      N writes on average is performed using `PutEntity` rather than `Put`.
      The wide-column entity written has the generated value in its default
      column; in addition, it contains up to three additional columns where
      the original generated value is divided up between the column name and the
      column value (with the column name containing the first k characters of
      the generated value, and the column value containing the rest). Whether
      `PutEntity` is used (and if so, how many columns the entity has) is completely
      determined by the "value base" used to generate the value (that is, there is
      no randomness involved). Assuming the same `use_put_entity_one_in` setting
      is used across `db_stress` invocations, this enables us to reconstruct and
      validate the entity during subsequent `db_stress` runs.
      
      Note that `PutEntity` is currently incompatible with `Merge`, transactions, and
      user-defined timestamps; these combinations are currently disabled/disallowed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10760
      
      Test Plan: Ran some batched, non-batched, and CF consistency stress tests using the script.
      
      Reviewed By: riversand963
      
      Differential Revision: D39939032
      
      Pulled By: ltamasi
      
      fbshipit-source-id: eafdf124e95993fb7d73158e3b006d11819f7fa9
      9078fccc
    • C
      Use actual file size when checking max_compaction_size (#10728) · fd71a82f
      Changyu Bi 提交于
      Summary:
      currently, there are places in compaction_picker where we add up `compensated_file_size` of files being compacted and limit the sum to be under `max_compaction_bytes`. `compensated_file_size` contains booster for point tombstones and should be used only for determining file's compaction priority. This PR replaces `compensated_file_size` with actual file size in such places.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10728
      
      Test Plan: CI
      
      Reviewed By: ajkr
      
      Differential Revision: D39789427
      
      Pulled By: cbi42
      
      fbshipit-source-id: 1f89fb6c0159c53bf01d8dc783f465959f442c81
      fd71a82f
  8. 30 9月, 2022 4 次提交
    • J
      Align compaction output file boundaries to the next level ones (#10655) · f3cc6663
      Jay Zhuang 提交于
      Summary:
      Try to align the compaction output file boundaries to the next level ones
      (grandparent level), to reduce the level compaction write-amplification.
      
      In level compaction, there are "wasted" data at the beginning and end of the
      output level files. Align the file boundary can avoid such "wasted" compaction.
      With this PR, it tries to align the non-bottommost level file boundaries to its
      next level ones. It may cut file when the file size is large enough (at least
      50% of target_file_size) and not too large (2x target_file_size).
      
      db_bench shows about 12.56% compaction reduction:
      ```
      TEST_TMPDIR=/data/dbbench2 ./db_bench --benchmarks=fillrandom,readrandom -max_background_jobs=12 -num=400000000 -target_file_size_base=33554432
      
      # baseline:
      Flush(GB): cumulative 25.882, interval 7.216
      Cumulative compaction: 285.90 GB write, 162.36 MB/s write, 269.68 GB read, 153.15 MB/s read, 2926.7 seconds
      
      # with this change:
      Flush(GB): cumulative 25.882, interval 7.753
      Cumulative compaction: 249.97 GB write, 141.96 MB/s write, 233.74 GB read, 132.74 MB/s read, 2534.9 seconds
      ```
      
      The compaction simulator shows a similar result (14% with 100G random data).
      As a side effect, with this PR, the SST file size can exceed the
      target_file_size, but is capped at 2x target_file_size. And there will be
      smaller files. Here are file size statistics when loading 100GB with the target
      file size 32MB:
      ```
                baseline      this_PR
      count  1.656000e+03  1.705000e+03
      mean   3.116062e+07  3.028076e+07
      std    7.145242e+06  8.046139e+06
      ```
      
      The feature is enabled by default, to revert to the old behavior disable it
      with `AdvancedColumnFamilyOptions.level_compaction_dynamic_file_size = false`
      
      Also includes https://github.com/facebook/rocksdb/issues/1963 to cut file before skippable grandparent file. Which is for
      use case like user adding 2 or more non-overlapping data range at the same
      time, it can reduce the overlapping of 2 datasets in the lower levels.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10655
      
      Reviewed By: cbi42
      
      Differential Revision: D39552321
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 640d15f159ab0cd973f2426cfc3af266fc8bdde2
      f3cc6663
    • G
      add SetCapacity and GetCapacity for secondary cache (#10712) · 47b57a37
      gitbw95 提交于
      Summary:
      To support tuning secondary cache dynamically, add `SetCapacity()` and `GetCapacity()` for CompressedSecondaryCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10712
      
      Test Plan: Unit Tests
      
      Reviewed By: anand1976
      
      Differential Revision: D39685212
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 19573c67237011927320207732b5de083cb87240
      47b57a37
    • H
      Remove and recreate expected values dir in white-box testing 2nd half (#10743) · aa714644
      Hui Xiao 提交于
      Summary:
      **Context:**
      https://github.com/facebook/rocksdb/pull/10732#pullrequestreview-1121076205
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10743
      
      Test Plan:
      - Locally run `python3 ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887`
      - CI jobs testing
      
      Reviewed By: ajkr
      
      Differential Revision: D39838733
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9e819b66b0293dfc7a31a908a9d42c6baca4aeaa
      aa714644
    • J
      cmake : Add ALL plugin LIBS to THIRD_PARTYLIBS (#10727) · 5f4b7364
      Joel Andres Granados 提交于
      Summary:
      Bringing in multiple libraries failed as they were not considered as separate arguments. In this commit we make sure to add *all* the libraries to THIRD_PARTYLIBS. Additionally we add more informative status messages for when the plugins get added.
      Signed-off-by: NJoel Granados <joel.granados@gmail.com>
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10727
      
      Reviewed By: riversand963
      
      Differential Revision: D39778566
      
      Pulled By: ajkr
      
      fbshipit-source-id: 34306b26ab4c726d17353ddd765f368967a1b59f
      5f4b7364
  9. 29 9月, 2022 2 次提交
    • A
      db_stress TestIngestExternalFile avoid empty files (#10754) · dc9f4996
      Andrew Kryczka 提交于
      Summary:
      If all the keys in range [key_base, shared->GetMaxKey()) are non-overwritable `TestIngestExternalFile()` would attempt to ingest a file with zero keys, leading to the following error: "Cannot create sst file with no entries". This PR changes `TestIngestExternalFile()` to return early in that case instead of going through with the ingestion attempt.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10754
      
      Reviewed By: hx235
      
      Differential Revision: D39909195
      
      Pulled By: ajkr
      
      fbshipit-source-id: e06e6b9cc24826fbd450e5130885e6f07164badd
      dc9f4996
    • A
      db_stress print TestMultiGet error value in hex (#10753) · b0d8ccbb
      Andrew Kryczka 提交于
      Summary:
      Without this fix, db_crashtest.py could fail with useless output such as: `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 267: invalid start byte`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10753
      
      Reviewed By: hx235
      
      Differential Revision: D39905809
      
      Pulled By: ajkr
      
      fbshipit-source-id: 50ba2cf20d206eeb168309cec137e827a34c8f0b
      b0d8ccbb
  10. 28 9月, 2022 3 次提交
  11. 27 9月, 2022 3 次提交
    • C
      Fix segfault in Iterator::Refresh() (#10739) · df492791
      Changyu Bi 提交于
      Summary:
      when a new internal iterator is constructed during iterator refresh, pointer to the previous memtable range tombstone iterator was not cleared. This could cause segfault for future `Refresh()` calls when they try to free the memtable range tombstones. This PR fixes this issue.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10739
      
      Test Plan: added a unit test in db_range_del_test.cc to reproduce this issue.
      
      Reviewed By: ajkr, riversand963
      
      Differential Revision: D39825283
      
      Pulled By: cbi42
      
      fbshipit-source-id: 3b59a2b73865aed39e28cdd5c1b57eed7991b94c
      df492791
    • H
      Support WriteCommit policy with sync_fault_injection=1 (#10624) · aed30ddf
      Hui Xiao 提交于
      Summary:
      **Context:**
      Prior to this PR, correctness testing with un-sync data loss [disabled](https://github.com/facebook/rocksdb/pull/10605) transaction (`use_txn=1`) thus all of the `txn_write_policy` . This PR improved that by adding support for one policy - WriteCommit (`txn_write_policy=0`).
      
      **Summary:**
      They key to this support is (a) handle Mark{Begin, End}Prepare/MarkCommit/MarkRollback in constructing ExpectedState under WriteCommit policy correctly and (b) monitor CI jobs and solve any test incompatibility issue till jobs are stable. (b) will be part of the test plan.
      
      For (a)
      - During prepare (i.e, between `MarkBeginPrepare()` and `MarkEndPrepare(xid)`), `ExpectedStateTraceRecordHandler` will buffer all writes by adding all writes to an internal `WriteBatch`.
      - On `MarkEndPrepare()`, that `WriteBatch` will be associated with the transaction's `xid`.
      - During the commit (i.e, on `MarkCommit(xid)`), `ExpectedStateTraceRecordHandler` will retrieve and iterate the internal `WriteBatch` and finally apply those writes to `ExpectedState`
      - During the rollback (i.e, on `MarkRollback(xid)`), `ExpectedStateTraceRecordHandler` will erase the internal `WriteBatch` from the map.
      
      For (b) - one major issue described below:
      - TransactionsDB in db stress recovers prepared-but-not-committed txns from the previous crashed run by randomly committing or rolling back it at the start of the current run, see a historical [PR](https://github.com/facebook/rocksdb/commit/6d06be22c083ccf185fd38dba49fde73b644b4c1) predated correctness testing.
      - And we will verify those processed keys in a recovered db against their expected state.
      - However since now we turn on `sync_fault_injection=1` where the expected state is constructed from the trace instead of using the LATEST.state from previous run. The expected state now used to verify those processed keys won't contain UNKNOWN_SENTINEL as they should - see test 1 for a failed case.
      - Therefore, we decided to manually update its expected state to be UNKNOWN_SENTINEL as part of the processing.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10624
      
      Test Plan:
      1. Test exposed the major issue described above. This test will fail without setting UNKNOWN_SENTINEL in expected state during the processing and pass after
      ```
      db=/dev/shm/rocksdb_crashtest_blackbox
      exp=/dev/shm/rocksdb_crashtest_expected
      dbt=$db.tmp
      expt=$exp.tmp
      
      rm -rf $db $exp
      mkdir -p $exp
      
      echo "RUN 1"
      ./db_stress \
      --clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=0 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --test_batches_snapshots=0 --value_size_mult=32 --writepercent=90 \
      --use_txn=1 --txn_write_policy=0 --sync_fault_injection=1 &
      pid=$!
      sleep 0.2
      sleep 20
      kill $pid
      sleep 0.2
      
      echo "RUN 2"
      ./db_stress \
      --clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=0 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --test_batches_snapshots=0 --value_size_mult=32 --writepercent=90 \
      --use_txn=1 --txn_write_policy=0 --sync_fault_injection=1 &
      pid=$!
      sleep 0.2
      sleep 20
      kill $pid
      sleep 0.2
      
      echo "RUN 3"
      ./db_stress \
      --clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=0 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --test_batches_snapshots=0 --value_size_mult=32 --writepercent=90 \
      --use_txn=1 --txn_write_policy=0 --sync_fault_injection=1
      ```
      
      2. Manual testing to ensure ExpectedState is constructed correctly during recovery by verifying it against previously crashed TransactionDB's WAL.
         - Run the following command to crash a TransactionDB with WriteCommit policy. Then `./ldb dump_wal` on its WAL file
      ```
      db=/dev/shm/rocksdb_crashtest_blackbox
      exp=/dev/shm/rocksdb_crashtest_expected
      rm -rf $db $exp
      mkdir -p $exp
      
      ./db_stress \
      	--clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=0 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --test_batches_snapshots=0 --value_size_mult=32 --writepercent=90 \
      	--use_txn=1 --txn_write_policy=0 --sync_fault_injection=1 &
      pid=$!
      sleep 30
      kill $pid
      sleep 1
      ```
      - Run the following command to verify recovery of the crashed db under debugger. Compare the step-wise result with WAL records (e.g, WriteBatch content, xid, prepare/commit/rollback marker)
      ```
         ./db_stress \
      	--clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=0 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --test_batches_snapshots=0 --value_size_mult=32 --writepercent=90 \
      	--use_txn=1 --txn_write_policy=0 --sync_fault_injection=1
      ```
      3. Automatic testing by triggering all RocksDB stress/crash test jobs for 3 rounds with no failure.
      
      Reviewed By: ajkr, riversand963
      
      Differential Revision: D39199373
      
      Pulled By: hx235
      
      fbshipit-source-id: 7a1dec0e3e2ee6ea86ddf5dd19ceb5543a3d6f0c
      aed30ddf
    • A
      Add OpenSSL to docker image (#10741) · 5d7cf311
      anand76 提交于
      Summary:
      Update the docker image with OpenSSL, required by the folly build.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10741
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D39831081
      
      Pulled By: anand1976
      
      fbshipit-source-id: 900154f70a456d1b6f9e384b8bdbcc227af4adbc
      5d7cf311