1. 19 11月, 2021 1 次提交
    • H
      Account Bloom/Ribbon filter construction memory in global memory limit (#9073) · 74544d58
      Hui Xiao 提交于
      Summary:
      Note: This PR is the 4th part of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073) and will rebase/merge only after the first three PRs (https://github.com/facebook/rocksdb/pull/9070, https://github.com/facebook/rocksdb/pull/9071, https://github.com/facebook/rocksdb/pull/9130) merge.
      
      **Context:**
      Similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`.
      
      The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](https://github.com/facebook/rocksdb/pull/9073#discussion_r741548409) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.
      
      **Summary:**
      - Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction:
         - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance),
         - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`),
         - final filter (i.e, `mutable_buf`'s size).
            - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as  explicitly delete the filter bits builder when done with the final filter in block based table.
      - Added option fo run `filter_bench` with this memory reservation feature
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9073
      
      Test Plan:
      - Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of  `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory`
        - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally
      - Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter
      - Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below:
         - FastLocalBloom
            - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`:
               - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0)
            - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`:
               - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0)
            - new feature of RibbonFilter with fallback  (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
               - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0)
      
          - Ribbon128
             - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`:
                 - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0)
             - new feature  (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `:
                 - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0)
             - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
                - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0)
                - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console.
      
      Reviewed By: pdillinger
      
      Differential Revision: D31991348
      
      Pulled By: hx235
      
      fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
      74544d58
  2. 18 11月, 2021 2 次提交
    • P
      Don't allow parallel crash_test in Makefile (#9180) · 4f678b52
      Peter Dillinger 提交于
      Summary:
      Using deps for running blackbox and whitebox allows them to be
      parallelized, which doesn't seem to be working well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9180
      
      Test Plan: make -j24 crash_test
      
      Reviewed By: siying
      
      Differential Revision: D32500851
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 364288c8d023b93e7ca2724ea40edae2f4eb0407
      4f678b52
    • D
      Fix integer overflow in TraceOptions (#9157) · c9539ede
      Davide Angelocola 提交于
      Summary:
      Hello from a happy user of rocksdb java :-)
      
      Default constructor of TraceOptions is supposed to initialize size to 64GB but the expression contains an integer overflow.
      
      Simple test case with JShell:
      ```
      jshell> 64 * 1024 * 1024 * 1024
      $1 ==> 0
      
      jshell> 64L * 1024 * 1024 * 1024
      $2 ==> 68719476736
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9157
      
      Reviewed By: pdillinger, zhichao-cao
      
      Differential Revision: D32369273
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6a0c95fff7a91f27ff15d65b662c6b101756b450
      c9539ede
  3. 17 11月, 2021 8 次提交
  4. 16 11月, 2021 1 次提交
    • Y
      Update TransactionUtil::CheckKeyForConflict to also use timestamps (#9162) · 20357988
      Yanqin Jin 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9162
      
      Existing TransactionUtil::CheckKeyForConflict() performs only seq-based
      conflict checking. If user-defined timestamp is enabled, it should perform
      conflict checking based on timestamps too.
      
      Update TransactionUtil::CheckKey-related methods to verify the timestamp of the
      latest version of a key is smaller than the read timestamp. Note that
      CheckKeysForConflict() is not updated since it's used only by optimistic
      transaction, and we do not plan to update it in this upcoming batch of diffs.
      
      Existing GetLatestSequenceForKey() returns the sequence of the latest
      version of a specific user key. Since we support user-defined timestamp, we
      need to update this method to also return the timestamp (if enabled) of the
      latest version of the key. This will be needed for snapshot validation.
      
      Reviewed By: ltamasi
      
      Differential Revision: D31567960
      
      fbshipit-source-id: 2e4a14aed267435a9aa91bc632d2411c01946d44
      20357988
  5. 13 11月, 2021 2 次提交
    • A
      Use system-wide thread ID in info log lines (#9164) · 9bb13c56
      Andrew Kryczka 提交于
      Summary:
      This makes it easier to debug with tools like `ps`. The change only
      applies to builds with glibc 2.30+ and _GNU_SOURCE extensions enabled.
      We could adopt it in more cases by using the syscall but this is enough
      for our build.
      
      Replaces https://github.com/facebook/rocksdb/issues/2973.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9164
      
      Test Plan:
      - ran some benchmarks and correlated logged thread IDs with those shown by `ps -L`.
      - verified no noticeable regression in throughput for log heavy (more than 700k log lines and over 5k / second) scenario.
      
      Benchmark command:
      
      ```
      $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom -compression_type=none -max_bytes_for_level_multiplier=2 -write_buffer_size=262144 -num_levels=7 -max_bytes_for_level_base=2097152 -target_file_size_base=524288 -level_compaction_dynamic_level_bytes=true -max_background_jobs=12 -num=20000000
      ```
      
      Results before: 15.9MB/s, 15.8MB/s, 16.0MB/s
      Results after: 16.3MB/s, 16.3MB/s, 15.8MB/s
      
      - Rely on CI to test the fallback behavior
      
      Reviewed By: riversand963
      
      Differential Revision: D32399660
      
      Pulled By: ajkr
      
      fbshipit-source-id: c24d44fdf7782faa616ef0a0964eaca3539d9c24
      9bb13c56
    • A
      Clarify max_write_buffer_size_to_maintain (#9154) · 3295e9f6
      Andrew Kryczka 提交于
      Summary:
      I was unable to figure out the behavior by reading the old doc so attempted to
      write it differently.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9154
      
      Reviewed By: riversand963
      
      Differential Revision: D32338843
      
      Pulled By: ajkr
      
      fbshipit-source-id: e1e67720cd92572b195583e5ea2c592180d4fefd
      3295e9f6
  6. 12 11月, 2021 1 次提交
  7. 11 11月, 2021 3 次提交
    • A
      Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056) · 17ce1ca4
      Akanksha Mahajan 提交于
      Summary:
      RocksDB does auto-readahead for iterators on noticing more than two sequential reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read up to max_auto_readahead_size. However at each level, if iterator moves over next file, readahead_size starts again from 8KB.
      
      This PR introduces a new ReadOption "adaptive_readahead" which when set true will maintain readahead_size  at each level. So when iterator moves from one file to another, new file's readahead_size will continue from previous file's readahead_size instead of scratch. However if reads are not sequential it will fall back to 8KB (default) with no prefetching for that block.
      
      1. If block is found in cache but it was eligible for prefetch (block wasn't in Rocksdb's prefetch buffer),  readahead_size will decrease by 8KB.
      2. It maintains readahead_size for L1 - Ln levels.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9056
      
      Test Plan:
      Added new unit tests
      Ran db_bench for "readseq, seekrandom, seekrandomwhilewriting, readrandom" with --adaptive_readahead=true and there was no regression if new feature is enabled.
      
      Reviewed By: anand1976
      
      Differential Revision: D31773640
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 7332d16258b846ae5cea773009195a5af58f8f98
      17ce1ca4
    • I
      Fix typo in env_win.h (#9138) · afcd3253
      Ikko Ashimine 提交于
      Summary:
      overide -> override
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9138
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32245235
      
      Pulled By: mrambacher
      
      fbshipit-source-id: bed62b843925bed806c06ca3485d33bb45a56dc7
      afcd3253
    • S
      Track per-SST user-defined timestamp information in MANIFEST (#9092) · 937fbcbd
      slk 提交于
      Summary:
      Track per-SST user-defined timestamp information in MANIFEST https://github.com/facebook/rocksdb/issues/8957
      
      Rockdb has supported user-defined timestamp feature. Application can specify a timestamp
      when writing each k-v pair. When data flush from memory to disk file called SST files, file
      creation activity will commit to MANIFEST. This commit is for tracking timestamp info in the
      MANIFEST for each file. The changes involved are as follows:
      1) Track max/min timestamp in FileMetaData, and fix invoved codes.
      2) Add NewFileCustomTag::kMinTimestamp and NewFileCustomTag::kMinTimestamp in
          NewFileCustomTag ( in the kNewFile4 part ), and support invoved codes such as
          VersionEdit Encode and Decode etc.
      3) Add unit test code for VersionEdit EncodeDecodeNewFile4, and fix invoved test codes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9092
      
      Reviewed By: ajkr, akankshamahajan15
      
      Differential Revision: D32252323
      
      Pulled By: riversand963
      
      fbshipit-source-id: d2642898d6e3ad1fef0eb866b98045408bd4e162
      937fbcbd
  8. 10 11月, 2021 5 次提交
    • A
      Remove invalid RocksJava native entry (#9147) · 1a8eec46
      Adam Retter 提交于
      Summary:
      It seems that an incorrect native source file entry was introduced in https://github.com/facebook/rocksdb/pull/8999. For some reason it appears that CI was not run against that PR, and so the problem was not detected.
      
      This PR fixes the problem by removing the invalid entry, allowing RocksJava to build correctly again.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9147
      
      Reviewed By: pdillinger
      
      Differential Revision: D32300976
      
      fbshipit-source-id: dbd763b806bacf0fc08f4deaf07c63d0a266c4cf
      1a8eec46
    • A
      Fix an error on GCC 4.8.5 where -Werror=unused-parameter fails (#9144) · fc93553e
      Adam Retter 提交于
      Summary:
      Before this fix compilation with GCC 4.8.5 20150623 (Red Hat 4.8.5-36) would fail with the following error:
      ```
      CC       jls/db/db_impl/db_impl.o
      In file included from ./env/file_system_tracer.h:8:0,
                       from ./file/random_access_file_reader.h:15,
                       from ./file/file_prefetch_buffer.h:15,
                       from ./table/format.h:13,
                       from ./table/internal_iterator.h:14,
                       from ./db/pinned_iterators_manager.h:12,
                       from ./db/range_tombstone_fragmenter.h:15,
                       from ./db/memtable.h:22,
                       from ./db/memtable_list.h:16,
                       from ./db/column_family.h:17,
                       from ./db/db_impl/db_impl.h:22,
                       from db/db_impl/db_impl.cc:9:
      ./include/rocksdb/file_system.h:108:8: error: unused parameter 'opts'
      [-Werror=unused-parameter]
       struct FileOptions : EnvOptions {
              ^
      db/db_impl/db_impl.cc: In member function 'virtual rocksdb::Status
      rocksdb::DBImpl::SetDBOptions(const
      std::unordered_map<std::basic_string<char>, std::basic_string<char>
      >&)':
      db/db_impl/db_impl.cc:1230:36: note: synthesized method
      'rocksdb::FileOptions& rocksdb::FileOptions::operator=(const
      rocksdb::FileOptions&)' first required here
             file_options_for_compaction_ = FileOptions(new_db_options);
                                          ^
        CC       jls/db/db_impl/db_impl_compaction_flush.o
      cc1plus: all warnings being treated as errors
      make[1]: *** [jls/db/db_impl/db_impl.o] Error 1
      make[1]: *** Waiting for unfinished jobs....
      make[1]: Leaving directory `/rocksdb-local-build'
      make: *** [rocksdbjavastatic] Error 2
      Makefile:2222: recipe for target 'rocksdbjavastaticdockerarm64v8' failed
      make: *** [rocksdbjavastaticdockerarm64v8] Error 2
      ```
      
      This was detected on both ppc64le and arm64v8, however it does not seem to appear in the same GCC 4.8 version we use for x64 in CircleCI - https://app.circleci.com/pipelines/github/facebook/rocksdb/9691/workflows/c2a94367-14f3-4039-be95-325c34643d41/jobs/227906
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9144
      
      Reviewed By: riversand963
      
      Differential Revision: D32290770
      
      fbshipit-source-id: c90a54ba2a618e1ff3660fff3f3368ab36c3c527
      fc93553e
    • Y
      Fix a bug in timestamp-related GC (#9116) · a113cecf
      Yanqin Jin 提交于
      Summary:
      For multiple versions (ts + seq) of the same user key, if they cross the boundary of `full_history_ts_low_`,
      we should retain the version that is visible to the `full_history_ts_low_`. Namely, we keep the internal key
      with the largest timestamp smaller than `full_history_ts_low`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9116
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D32261514
      
      Pulled By: riversand963
      
      fbshipit-source-id: e10f47c254c04c05261440051e4f50cb7d95474e
      a113cecf
    • H
      RAII support for per cache reservation through handle (#9130) · 2fbe32b0
      Hui Xiao 提交于
      Summary:
      Note: This PR is the 3rd PR of a bigger PR stack (https://github.com/facebook/rocksdb/issues/9073) and depends on the second PR (https://github.com/facebook/rocksdb/pull/9071). **See changes from this PR only https://github.com/facebook/rocksdb/pull/9130/commits/00447324d082136b0e777d3ab6a3df3a8452c633**
      
      Context:
      pdillinger brought up a good [point](https://github.com/facebook/rocksdb/pull/9073#discussion_r741478309) about lacking RAII support for per cache reservation in `CacheReservationManager`  when reviewing https://github.com/facebook/rocksdb/pull/9073.
      
      To summarize the discussion, the current API `CacheReservationManager::UpdateCacheReservation()` requires callers to explicitly calculate and pass in a correct`new_mem_used` to release a cache reservation (if they don't want to rely on the clean-up during `CacheReservationManager`'s destruction - such as they want to release it earlier).
      
      While this implementation has convenience in some use-case such as `WriteBufferManager`, where [reservation](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L69-L91) and [release](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L109-L129) amounts do not necessarily correspond symmetrically and thus a flexible `new_mem_used` inputing is needed, it can be prone to caller's calculation error as well as cause a mass of codes in releasing cache in other use-case such as filter construction, where reservation and release amounts do correspond symmetrically and many code paths requiring a cache release, as [pointed](https://github.com/facebook/rocksdb/pull/9073#discussion_r741478309) out by pdillinger.
      
      Therefore we decided to provide a new API in `CacheReservationManager` to update reservation with better RAII support for per cache reservation, using a handle to manage the life time of that particular cache reservation.
      - Added a new class `CacheReservationHandle`
      - Added a new API `CacheReservationManager::MakeCacheReservation()` that outputs a `CacheReservationHandle` for managing the reservation
      - Updated class comments to clarify two different cache reservation methods
      
      Tests:
      - Passing new tests
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9130
      
      Reviewed By: pdillinger
      
      Differential Revision: D32199446
      
      Pulled By: hx235
      
      fbshipit-source-id: 1cba7c636e5ecfb55b0c1e0c2d218cc9b5b30b4e
      2fbe32b0
    • H
      Add new API CacheReservationManager::GetTotalMemoryUsage() (#9071) · ffd6085e
      Hui Xiao 提交于
      Summary:
      Note: This PR is the 2nd PR of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073).
      
      Context:
      `CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)` accepts an accumulated total memory used (e.g, used 10MB so far) instead of usage change (e.g, increase by 5 MB, decrease by 5 MB). It has benefits including consolidating API for increase and decrease as described in https://github.com/facebook/rocksdb/pull/8506.
      
      However, not every `CacheReservationManager` user keeps track of this accumulated total memory usage. For example, Bloom/Ribbon Filter construction (e.g, [here](https://github.com/facebook/rocksdb/blob/822d729fcd9f7af9f371ca7168e52dbdab898e41/table/block_based/filter_policy.cc#L587) in https://github.com/facebook/rocksdb/pull/9073) does not  while WriteBufferManager and compression dictionary buffering do.
      
      Considering future users might or might not keep track of this counter and implementing this counter within `CacheReservationManager` is easy due to the passed-in `std::size_t new_memory_used` in calling `CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)`, it is proposed to add a new API `CacheReservationManager::GetTotalMemoryUsage()`.
      
      As noted in the API comments,   since `CacheReservationManager` is NOT thread-safe, external synchronization is
       needed in calling `UpdateCacheReservation()` if you want `GetTotalMemoryUsed()` returns the indeed latest memory used.
      - Added and updated private counter `memory_used_` every time `CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)` is called regardless if the call returns non-okay status
      - Added `CacheReservationManager::GetTotalMemoryUsage()` to return `memory_used_`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9071
      
      Test Plan:
      - Passing new tests
      - Passing existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D31887813
      
      Pulled By: hx235
      
      fbshipit-source-id: 9a09f0c8683822673260362894c878b61ee60ceb
      ffd6085e
  9. 09 11月, 2021 6 次提交
  10. 07 11月, 2021 2 次提交
  11. 06 11月, 2021 5 次提交
    • J
      Deflake DBBasicTestWithTimestampCompressionSettings.PutAndGetWithComp… (#9136) · 5aad38f2
      Jay Zhuang 提交于
      Summary:
      …action
      ```
      db/db_with_timestamp_basic_test.cc:2643: Failure
      db_->CompactFiles(compact_opt, handles_[cf], collector->GetFlushedFiles(), static_cast<int>(kNumTimestamps - i))
      Invalid argument: A compaction must contain at least one file.
      ```
      Able to be reproduced by run multiple test in parallel.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9136
      
      Test Plan:
      ```
      gtest-parallel ./db_with_timestamp_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampCompressionSettings.PutAndGetWithCompaction/12 -r 100 -w 100
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D32197734
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: aeb0d6e9b37312f577e203ca81bb7a0f14d4e7ce
      5aad38f2
    • L
      Refactor and unify blob file saving and the logic that finds the oldest live blob file (#9122) · 3fbddb1d
      Levi Tamasi 提交于
      Summary:
      The patch refactors and unifies the logic in `VersionBuilder::SaveBlobFilesTo`
      and `VersionBuilder::GetMinOldestBlobFileNumber` by introducing a generic
      helper that can "merge" the list of `BlobFileMetaData` in the base version with
      the list of `MutableBlobFileMetaData` representing the updated state after
      applying a sequence of `VersionEdit`s. This serves as groundwork for subsequent
      changes that will enable us to determine whether a blob file is live after applying
      a sequence of edits without calling `VersionBuilder::SaveTo`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9122
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D32151472
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 11622b475866de823334b8bc21b0e99d913af97e
      3fbddb1d
    • H
      Minor improvement to CacheReservationManager/WriteBufferManager/CompressionDictBuilding (#9139) · 3018a3e2
      Hui Xiao 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9139
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D32211415
      
      Pulled By: hx235
      
      fbshipit-source-id: 39ce036ba34e1fb4a1992a33ac6904a4a943301d
      3018a3e2
    • Y
      Fix assertion error during compaction with write-prepared txn enabled (#9105) · 5237b39d
      Yanqin Jin 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9105
      
      The user contract of SingleDelete is that: a SingleDelete can only be issued to
      a key that exists and has NOT been updated. For example, application can insert
      one key `key`, and uses a SingleDelete to delete it in the future. The `key`
      cannot be updated or removed using Delete.
      In reality, especially when write-prepared transaction is being used, things
      can get tricky. For example, a prepared transaction already writes `key` to the
      memtable after a successful Prepare(). Afterwards, should the transaction
      rollback, it will insert a Delete into the memtable to cancel out the prior
      Put. Consider the following sequence of operations.
      
      ```
      // operation sequence 1
      Begin txn
      Put(key)
      Prepare()
      Flush()
      
      Rollback txn
      Flush()
      ```
      
      There will be two SSTs resulting from above. One of the contains a PUT, while
      the second one contains a Delete. It is also known that releasing a snapshot
      can lead to an L0 containing only a SD for a particular key. Consider the
      following operations following the above block.
      
      ```
      // operation sequence 2
      db->Put(key)
      db->SingleDelete(key)
      Flush()
      ```
      
      The operation sequence 2 can result in an L0 with only the SD.
      
      Should there be a snapshot for conflict checking created before operation
      sequence 1, then an attempt to compact the db may hit the assertion failure
      below, because ikey_.type is Delete (from a rollback).
      
      ```
      else if (clear_and_output_next_key_) {
        assert(ikey_.type == kTypeValue || ikey_.type == kTypeBlobIndex);
      }
      ```
      
      To fix the assertion failure, we can skip the SingleDelete if we detect an
      earlier Delete in the same snapshot interval.
      
      Reviewed By: ltamasi
      
      Differential Revision: D32056848
      
      fbshipit-source-id: 23620a91e28562d91c45cf7e95f414b54b729748
      5237b39d
    • J
      Fixing more loosely formatted issue (#9140) · 7d74d347
      Jack Feng 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9140
      
      Upon checking, some jobs still have loosely formatted issue.
      Also, some started failing due to my previous buggy change.
      
      https://fburl.com/scuba/sandcastle/e9mhejsc
      https://fburl.com/scuba/duplo_events/dbxx4215
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32213703
      
      fbshipit-source-id: 806f872b820afe275cf62459988c9d6f668af35c
      7d74d347
  12. 05 11月, 2021 4 次提交