1. 02 9月, 2021 1 次提交
  2. 01 9月, 2021 2 次提交
    • H
      Implement superior user & mid IO priority level in GenericRateLimiter (#8595) · 240c4126
      Hui Xiao 提交于
      Summary:
      Context:
      An extra IO_USER priority in rate limiter allows users to optionally charge WAL writes / SST reads to rate limiter at this priority level, which then has higher priority than IO_HIGH and IO_LOW. With an extra IO_USER priority, it allows users to better specify the relative urgency/importance among different requests in rate limiter. As a consequence, IO resource management can better prioritize and limit resource based on user's need.
      
      The IO_USER is implemented as superior priority in GenericRateLimiter, in the sense that its request queue will always be iterated first without being constrained to fairness. The reason is that the notion of fairness is only meaningful in helping lower priorities in background IO (i.e, IO_HIGH/MID/LOW) to gain some fair chance to run so that it does not block foreground IO (i.e, the ones that are charged at the level of IO_USER). As we can see, the ultimate goal here is to not blocking foreground IO at IO_USER level, which justifies the superiority of IO_USER.
      
      Similar benefits exist for IO_MID priority.
      - Rewrote the logic of deciding the order of iterating request queues of high/low priorities to include the extra user/mid priority w/o affecting the existing behavior (see PR's [comment](https://github.com/facebook/rocksdb/pull/8595/files#r678749331))
      - Included the request queue of user-pri/mid-pri in the code path of next-leader-candidate signaling and GenericRateLimiter's destructor
      - Included the extra user/mid-pri in bookkeeping data structures: total_bytes_through_ and total_requests_
      - Re-written the previous impl of explicitly iterating priorities with a loop from Env::IO_LOW to Env::IO_TOTAL
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8595
      
      Test Plan:
      - passed existing rate_limiter_test.cc
      - passed added unit tests in rate_limiter_test.cc
      - run performance test to verify performance with only high/low requests is not affected by this change
         - Set-up command:
         `TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom --duration=5 --compression_type=none --num=100000000 --disable_auto_compactions=true --write_buffer_size=1048576 --writable_file_max_buffer_size=65536 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --level0_slowdown_writes_trigger=$(((1 << 31) - 1)) --level0_stop_writes_trigger=$(((1 << 31) - 1))`
      
          - Test command:
         `TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=overwrite --use_existing_db=true --disable_wal=true --duration=30 --compression_type=none --num=100000000 --write_buffer_size=1048576 --writable_file_max_buffer_size=65536 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --level0_slowdown_writes_trigger=$(((1 << 31) - 1)) --level0_stop_writes_trigger=$(((1 << 31) - 1)) --statistics=true --rate_limiter_bytes_per_sec=1048576 --rate_limiter_refill_period_us=1000  --threads=32 |& grep -E '(flush|compact)\.write\.bytes'`
      
         - Before (on branch upstream/master):
         `rocksdb.compact.write.bytes COUNT : 4014162`
         `rocksdb.flush.write.bytes COUNT : 26715832`
          rocksdb.flush.write.bytes/rocksdb.compact.write.bytes ~= 6.66
      
         - After (on branch rate_limiter_user_pri):
        `rocksdb.compact.write.bytes COUNT : 3807822`
        `rocksdb.flush.write.bytes COUNT : 26098659`
         rocksdb.flush.write.bytes/rocksdb.compact.write.bytes ~= 6.85
      
      Reviewed By: ajkr
      
      Differential Revision: D30577783
      
      Pulled By: hx235
      
      fbshipit-source-id: 0881f2705ffd13ecd331256bde7e8ec874a353f4
      240c4126
    • Q
      Replace `std::shared_ptr<SystemClock>` by `SystemClock*` in `TraceExecutionHandler` (#8729) · 7b555546
      Qizhong Mao 提交于
      Summary:
      All/most trace related APIs directly use `SystemClock*` (https://github.com/facebook/rocksdb/pull/8033). Do the same in `TraceExecutionHandler`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8729
      
      Test Plan: None
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30672159
      
      Pulled By: autopear
      
      fbshipit-source-id: 017db4912c6ac1cfede842b8b122cf569a394f25
      7b555546
  3. 31 8月, 2021 2 次提交
    • A
      Fix a race in LRUCacheShard::Promote (#8717) · ec9f52ec
      anand76 提交于
      Summary:
      In ```LRUCacheShard::Promote```, a reference is released outside the LRU mutex. Fix the race condition.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8717
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30649206
      
      Pulled By: anand1976
      
      fbshipit-source-id: 09c0af05b2294a7fe2c02876a61b0bad6e3ada61
      ec9f52ec
    • P
      Built-in support for generating unique IDs, bug fix (#8708) · 13ded694
      Peter Dillinger 提交于
      Summary:
      Env::GenerateUniqueId() works fine on Windows and on POSIX
      where /proc/sys/kernel/random/uuid exists. Our other implementation is
      flawed and easily produces collision in a new multi-threaded test.
      As we rely more heavily on DB session ID uniqueness, this becomes a
      serious issue.
      
      This change combines several individually suitable entropy sources
      for reliable generation of random unique IDs, with goal of uniqueness
      and portability, not cryptographic strength nor maximum speed.
      
      Specifically:
      * Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
      rather than in Env implementation details. Callers are now told whether
      the operation fails or succeeds.
      * Adds an internal API GenerateRawUniqueId for generating high-quality
      128-bit unique identifiers, by combining entropy from three "tracks":
        * Lots of info from default Env like time, process id, and hostname.
        * std::random_device
        * port::GenerateRfcUuid (when working)
      * Built-in implementations of Env::GenerateUniqueId() will now always
      produce an RFC 4122 UUID string, either from platform-specific API or
      by converting the output of GenerateRawUniqueId.
      
      DB session IDs now use GenerateRawUniqueId while DB IDs (not as
      critical) try to use port::GenerateRfcUuid but fall back on
      GenerateRawUniqueId with conversion to an RFC 4122 UUID.
      
      GenerateRawUniqueId is declared and defined under env/ rather than util/
      or even port/ because of the Env dependency.
      
      Likely follow-up: enhance GenerateRawUniqueId to be faster after the
      first call and to guarantee uniqueness within the lifetime of a single
      process (imparting the same property onto DB session IDs).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708
      
      Test Plan:
      A new mini-stress test in env_test checks the various public
      and internal APIs for uniqueness, including each track of
      GenerateRawUniqueId individually. We can't hope to verify anywhere close
      to 128 bits of entropy, but it can at least detect flaws as bad as the
      old code. Serial execution of the new tests takes about 350 ms on
      my machine.
      
      Reviewed By: zhichao-cao, mrambacher
      
      Differential Revision: D30563780
      
      Pulled By: pdillinger
      
      fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
      13ded694
  4. 28 8月, 2021 3 次提交
  5. 27 8月, 2021 3 次提交
  6. 26 8月, 2021 2 次提交
  7. 25 8月, 2021 6 次提交
    • Y
      Temporarily disable block-based filter when stress testing timestamp (#8703) · d8eb8243
      Yanqin Jin 提交于
      Summary:
      Current implementation does not support user-defined timestamp when
      block-based filter is used. Will implement the support in the future, or
      wait to see if block-based filter can be deprecated and removed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8703
      
      Test Plan: make whitebox_crash_test_with_ts
      
      Reviewed By: pdillinger
      
      Differential Revision: D30528931
      
      Pulled By: riversand963
      
      fbshipit-source-id: 60dd74ee0a6194e69072069d8c4bd876f249f38d
      d8eb8243
    • Y
      Fix a bug of secondary instance sequence going backward (#8653) · f235f4b0
      Yanqin Jin 提交于
      Summary:
      Recent refactor of `ReactiveVersionSet::ReadAndApply()` uses
      `ManifestTailer` whose `Iterate()` method can cause the db's
      `last_sequence_` to go backward. Consequently, read requests can see
      out-dated data. For example, latest changes to the primary will not be
      seen on the secondary even after a `TryCatchUpWithPrimary()` if no new
      write batches are read from the WALs and no new MANIFEST entries are
      read from the MANIFEST.
      
      Fix the bug so that `VersionEditHandler::CheckIterationResult` will
      never decrease `last_sequence_`, `last_allocated_sequence_` and
      `last_published_sequence_`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8653
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D30272084
      
      Pulled By: riversand963
      
      fbshipit-source-id: c6a49c534b2509b93ef62d8936ed0acd5b860eaa
      f235f4b0
    • M
      Simplify `TraceAnalyzer` (#8697) · 785faf2d
      Merlin Mao 提交于
      Summary:
      Handler functions now use a common output function to output to stdout/files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8697
      
      Test Plan: `trace_analyzer_test` can pass.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30527696
      
      Pulled By: autopear
      
      fbshipit-source-id: c626cf4d53a39665a9c4bcf0cb019c448434abe4
      785faf2d
    • P
      Add port::GetProcessID() (#8693) · 318fe694
      Peter Dillinger 提交于
      Summary:
      Useful in some places for object uniqueness across processes.
      Currently used for generating a host-wide identifier of Cache objects
      but expected to be used soon in some unique id generation code.
      
      `int64_t` is chosen for return type because POSIX uses signed integer type,
      usually `int`, for `pid_t` and Windows uses `DWORD`, which is `uint32_t`.
      
      Future work: avoid copy-pasted declarations in port_*.h, perhaps with
      port_common.h always included from port.h
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8693
      
      Test Plan: manual for now
      
      Reviewed By: ajkr, anand1976
      
      Differential Revision: D30492876
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 39fc2788623cc9f4787866bdb67a4d183dde7eef
      318fe694
    • Y
      Allow iterate refresh for secondary instance (#8700) · 229350ef
      Yanqin Jin 提交于
      Summary:
      Test plan
      make check
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8700
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30523907
      
      Pulled By: riversand963
      
      fbshipit-source-id: 68928ab4dafb64ce80ab7bc69d83727a4713ab91
      229350ef
    • H
      Refactor WriteBufferManager::CacheRep into CacheReservationManager (#8506) · 74cfe7db
      Hui Xiao 提交于
      Summary:
      Context:
      To help cap various memory usage by a single limit of the block cache capacity, we charge the memory usage through inserting/releasing dummy entries in the block cache. CacheReservationManager is such a class (non thread-safe) responsible for  inserting/removing dummy entries to reserve cache space for memory used by the class user.
      
      - Refactored the inner private class CacheRep of WriteBufferManager into public CacheReservationManager class for reusability such as for https://github.com/facebook/rocksdb/pull/8428
      
      - Encapsulated implementation details of cache key generation and dummy entries insertion/release in cache reservation as discussed in https://github.com/facebook/rocksdb/pull/8506#discussion_r666550838
      
      - Consolidated increase/decrease cache reservation into one API - UpdateCacheReservation.
      
      - Adjusted the previous dummy entry release algorithm in decreasing cache reservation to be loop-releasing dummy entries to stay symmetric to dummy entry insertion algorithm
      
      - Made the previous dummy entry release algorithm in delayed decrease mode more aggressive for better decreasing cache reservation when memory used is less likely to increase back.
      
        Previously, the algorithms only release 1 dummy entries when new_mem_used < 3/4 * cache_allocated_size_ and cache_allocated_size_ - kSizeDummyEntry > new_mem_used.
      Now, the algorithms loop-releases as many dummy entries as possible when new_mem_used < 3/4 * cache_allocated_size_.
      
      - Updated WriteBufferManager's test cases to adapt to changes on the release algorithm mentioned above and left comment for some test cases for clarity
      
      - Replaced the previous cache key prefix generation (utilizing object address related to the cache client) with one that utilizes Cache->NewID() to prevent cache-key collision among dummy entry clients sharing the same cache.
      
        The specific collision we are preventing happens when the object address is reused for a new cache-key prefix while the old cache-key using that same object address in its prefix still exists in the cache. This could happen due to that, under LRU cache policy, there is a possible delay in releasing a cache entry after the cache client object owning that cache entry get deallocated. In this case, the object address related to the cache client object can get reused for other client object to generate a new cache-key prefix.
      
        This prefix generation can be made obsolete after Peter's unification of all the code generating cache key, mentioned in https://github.com/facebook/rocksdb/pull/8506#discussion_r667265255
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8506
      
      Test Plan:
      - Passing the added unit tests cache_reservation_manager_test.cc
      - Passing existing and adjusted write_buffer_manager_test.cc
      
      Reviewed By: ajkr
      
      Differential Revision: D29644135
      
      Pulled By: hx235
      
      fbshipit-source-id: 0fc93fbfe4a40bb41be85c314f8f2bafa8b741f7
      74cfe7db
  8. 24 8月, 2021 5 次提交
    • A
      Deflake write-prepared and write-unprepared tests (#8696) · c521f22a
      Andrew Kryczka 提交于
      Summary:
      The `JobContext::job_snapshot` referenced DB state but could
      have been deleted by a BG thread after the signal/unlock allowing
      shutdown to proceed. Then we would see an error like this (valgrind):
      
      ```
      ==354104== Thread 2:
      ==354104== Invalid read of size 8
      ==354104==    at 0x694C4D: rocksdb::ManagedSnapshot::~ManagedSnapshot() (snapshot_impl.cc:20)
      ==354104==    by 0x58F5BA: operator() (unique_ptr.h:81)
      ==354104==    by 0x58F5BA: operator() (unique_ptr.h:75)
      ==354104==    by 0x58F5BA: ~unique_ptr (unique_ptr.h:292)
      ==354104==    by 0x58F5BA: rocksdb::JobContext::~JobContext() (job_context.h:221)
      ==354104==    by 0x5F155E: rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) (db_impl_compaction_flush.cc:2696)
      ==354104==    by 0x5F1BC2: rocksdb::DBImpl::BGWorkCompaction(void*) (db_impl_compaction_flush.cc:2468)
      ==354104==    by 0x83707A: operator() (std_function.h:688)
      ==354104==    by 0x83707A: rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) (threadpool_imp.cc:266)
      ==354104==    by 0x8373ED: rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) (threadpool_imp.cc:307)
      ==354104==    by 0x492A800: execute_native_thread_routine (in /usr/local/fbcode/platform009/lib/libstdc++.so.6.0.28)
      ==354104==    by 0x4A5020B: start_thread (in /usr/local/fbcode/platform009/lib/libpthread-2.30.so)
      ==354104==    by 0x4CF281E: clone (in /usr/local/fbcode/platform009/lib/libc-2.30.so)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8696
      
      Test Plan: unable to repro
      
      Reviewed By: pdillinger
      
      Differential Revision: D30505277
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5a99f34137cd14d06b0f624add6d37a70a61135d
      c521f22a
    • M
      Refactor TraceAnalyzer to use `TraceRecord::Handler` to avoid casting. (#8678) · f6437ea4
      Merlin Mao 提交于
      Summary:
      `TraceAnalyzer` privately inherits `TraceRecord::Handler` and `WriteBatch::Handler`.
      
      `trace_analyzer_test` can pass with this change.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8678
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30459814
      
      Pulled By: autopear
      
      fbshipit-source-id: a27f59ac4600f7c3682830c9b1d9dc79e53425be
      f6437ea4
    • J
      Add extra information to RemoteCompaction APIs (#8680) · 249b1078
      Jay Zhuang 提交于
      Summary:
      Currently, we only provide job_id in RemoteCompaction APIs, the
      main problem of `job_id` is it cannot uniquely identify a compaction job
      between DB instances or between sessions.
      Providing DB and session id to the user, which will make building cross
      DB compaction service easier.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8680
      
      Test Plan: unittest
      
      Reviewed By: ajkr
      
      Differential Revision: D30444859
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: fdf107f4286564049637f154193c6d94c3c59448
      249b1078
    • P
      Allow intentionally swallowed errors in BlockBasedFilterBlockReader (#8695) · 1a5eb33d
      Peter Dillinger 提交于
      Summary:
      To avoid getting "Didn't get expected error from Get" from
      crash test by enabling block-based filter in crash test in https://github.com/facebook/rocksdb/issues/8679.
      Basically, this applies the pattern of IGNORE_STATUS_IF_ERROR in
      full_filter_block.cc to block_based_filter_block.cc
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8695
      
      Test Plan: watch for resolution of crash test runs
      
      Reviewed By: ltamasi
      
      Differential Revision: D30496748
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f7808fcf14c0e787fe81da03fa8303244590d273
      1a5eb33d
    • P
      Fix typo in 6.24.0 HISTORY.md (#8694) · 0637c8d3
      Peter Dillinger 提交于
      Summary:
      fix typo
      
      Also, clarified change of C API signatures.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8694
      
      Test Plan: visual
      
      Reviewed By: ltamasi
      
      Differential Revision: D30492882
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ac6dc3dcefa01c91fd87fc7f50279ea5e13fa41d
      0637c8d3
  9. 23 8月, 2021 1 次提交
  10. 21 8月, 2021 8 次提交
    • L
      Update version.h and HISTORY.md for the 6.24 release (#8688) · 8c9e6897
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8688
      
      Reviewed By: ajkr, riversand963
      
      Differential Revision: D30467746
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 0fce0d42fe2fe3cb56d7a89607154b3b957f09b6
      8c9e6897
    • P
      Embed original file number in SST table properties (#8686) · 04db7648
      Peter Dillinger 提交于
      Summary:
      I very recently realized that with https://github.com/facebook/rocksdb/issues/8669 we cannot later add
      file numbers to external SST files (so that more can share db session
      ids for better uniqueness properties), because of forward compatibility.
      We would have a version of RocksDB that assumes session IDs are unique
      on external SST files and therefore can't really break that invariant in
      future files.
      
      This change adds a table property for "orig_file_number" which is
      populated by normal SST files and also external SST files generated by
      SstFileWriter. SstFileWriter now keeps a db_session_id for life of the
      object and increments its own file numbers for embedding in table
      properties. (They are arguably "fake" file numbers because these numbers
      and not embedded in the file name.)
      
      While updating block_based_table_builder, I removed several unnecessary
      fields from Rep, because following the pattern would have created
      another unnecessary field.
      
      This change also updates block_based_table_reader to use this new
      property when available, which means that for newer SST files, we can
      determine the stable/original <db_session_id,file_number> unique
      identifier using just the file contents, not the file name. (It's a bit
      complicated; detailed comments in block_based_table_reader.)
      
      Also added DB host id to properties listing by sst_dump, which could be
      useful in debugging.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8686
      
      Test Plan: majorly overhauled StableCacheKeys test for this change
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30457742
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2e5ae7dddeb94fb9d8eac8a928486aed8b8cd445
      04db7648
    • P
      Upgrade xxhash, add Hash128 (#8634) · 22161b75
      Peter Dillinger 提交于
      Summary:
      With expected use for a 128-bit hash, xxhash library is
      upgraded to current dev (2c611a76f914828bed675f0f342d6c4199ffee1e)
      as of Aug 6 so that we can use production version of XXH3_128bits
      as new Hash128 function (added in hash128.h).
      
      To make this work, however, we have to carve out the "preview" version
      of XXH3 that is used in new SST Bloom and Ribbon filters, since that
      will not get maintenance in xxhash releases. I have consolidated all the
      relevant code into xxph3.h and made it "inline only" (no .cc file). The
      working name for this hash function is changed from XXH3p to XXPH3
      (XX Preview Hash) because the latter is easier to get working with no
      symbol name conflicts between the headers.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8634
      
      Test Plan:
      no expected change in existing functionality. For Hash128,
      added some unit tests based on those for Hash64 to ensure some basic
      properties and that the values do not change accidentally.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30173490
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 06aa542a7a28b353bc2c865b9b2f8bdfe44158e4
      22161b75
    • P
      Add Bloom/Ribbon hybrid API support (#8679) · 2a383f21
      Peter Dillinger 提交于
      Summary:
      This is essentially resurrection and fixing of the part of
      https://github.com/facebook/rocksdb/issues/8198 that was reverted in https://github.com/facebook/rocksdb/issues/8212, using data added in https://github.com/facebook/rocksdb/issues/8246. Basically,
      when configuring Ribbon filter, you can specify an LSM level before which
      Bloom will be used instead of Ribbon. But Bloom is only considered for
      Leveled and Universal compaction styles and file going into a known LSM
      level. This way, SST file writer, FIFO compaction, etc. use Ribbon filter as
      you would expect with NewRibbonFilterPolicy.
      
      So that this can be controlled with a single int value and so that flushes
      can be distinguished from intra-L0, we consider flush to go to level -1 for
      the purposes of this option. (Explained in API comment.)
      
      I also expect the most common and recommended Ribbon configuration to
      use Bloom during flush, to minimize slowing down writes and because according
      to my estimates, Ribbon only pays off if the structure lives in memory for
      more than an hour. Thus, I have changed the default for NewRibbonFilterPolicy
      to be this mild hybrid configuration. I don't really want to add something like
      NewHybridFilterPolicy because at least the mild hybrid configuration (Bloom for
      flush, Ribbon otherwise) should be considered a natural choice.
      
      C APIs also updated, but because they don't support overloading,
      rocksdb_filterpolicy_create_ribbon is kept pure ribbon for clarity and
      rocksdb_filterpolicy_create_ribbon_hybrid must be called for a hybrid
      configuration. While touching C API, I changed bits per key options from
      int to double.
      
      BuiltinFilterPolicy is needed so that LevelThresholdFilterPolicy doesn't inherit
      unused fields from BloomFilterPolicy.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8679
      
      Test Plan: new + updated tests, including crash test
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D30445797
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6f5aeddfd6d79f7e55493b563c2d1d2d568892e1
      2a383f21
    • M
      Add `IteratorTraceExecutionResult` for iterator related trace records. (#8687) · baf22b4e
      Merlin Mao 提交于
      Summary:
      - Allow to get `Valid()`, `status()`, `key()` and `value()` of an iterator from `IteratorTraceExecutionResult`.
      - Move lower bound and upper bound from `IteratorSeekQueryTraceRecord` to `IteratorQueryTraceRecord`.
      
      Added test in `DBTest2.TraceAndReplay`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8687
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30457630
      
      Pulled By: autopear
      
      fbshipit-source-id: be433099a25895b3aa6f0c00f95ad7b1d7489c1d
      baf22b4e
    • A
      Add a PerfContext counter for secondary cache hits (#8685) · f35042ca
      anand76 提交于
      Summary:
      Add a PerfContext counter.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8685
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30453957
      
      Pulled By: anand1976
      
      fbshipit-source-id: 42888a3ced240e1c44446d52d3b04adfb01f5665
      f35042ca
    • A
      Update the block_read_count/block_read_byte counters in MultiGet (#8676) · 22f2936b
      anand76 提交于
      Summary:
      MultiGet in block based table reader doesn't use BlockFetcher. As a result, the block_read_count and block_read_byte PerfContext counters were not being updated. This fixes that by updating them in MultiRead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8676
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30428680
      
      Pulled By: anand1976
      
      fbshipit-source-id: 21846efe92588fc17123665dd06733693a40126d
      22f2936b
    • A
      Fix blob callback in compaction and atomic flush (#8681) · 5efec84c
      Akanksha Mahajan 提交于
      Summary:
      Pass BlobFileCompletionCallback  in case of atomic flush and
      compaction job which is currently nullptr(default parameter).
      BlobFileCompletionCallback is used in case of IntegratedBlobDB to report new blob files to
      SstFileManager.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8681
      
      Test Plan: CircleCI jobs
      
      Reviewed By: ltamasi
      
      Differential Revision: D30445998
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: ba48093843864faec57f1f365cce7b5a569c4021
      5efec84c
  11. 20 8月, 2021 2 次提交
    • M
      Add iterator's lower and upper bounds to `TraceRecord` (#8677) · ff895338
      Merlin Mao 提交于
      Summary:
      Trace file V2 added lower/upper bounds to `Iterator::Seek()` and `Iterator::SeekForPrev()`. They were not used anywhere during the execution of a `TraceRecord`. Now they are added to be used by `ReadOptions` during `Iterator::Seek()` and `Iterator::SeekForPrev()` if they are set.
      
      Added test cases in `DBTest2.TraceAndManualReplay`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8677
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30438255
      
      Pulled By: autopear
      
      fbshipit-source-id: 82563006be0b69155990e506a74951c18af8d288
      ff895338
    • M
      Fix some minor issues in the Customizable infrastructure (#8566) · 9eb002fc
      mrambacher 提交于
      Summary:
      - Fix issue with OptionType::Vector when the nested item is a Customizable with no names
      - Fix issue with OptionType::Vector to appropriately wrap the elements in a Vector;
      - Fix an issue with nested Customizable object with a null immutable object still appearing in the mutable options;
      - Fix/Add tests for null/empty customizable objects
      - Move the RegisterTestObjects from customizable_test into testutil.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8566
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30303724
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 33fa8ea2a3b663210cb356da05e64aab7585b1b5
      9eb002fc
  12. 19 8月, 2021 3 次提交
    • B
      Add condition on NotifyOnFlushComplete that FlushJob was not mempurge. Add... · c625b8d0
      Baptiste Lemaire 提交于
      Add condition on NotifyOnFlushComplete that FlushJob was not mempurge. Add event listeners to mempurge tests. (#8672)
      
      Summary:
      Previously, when a `FlushJob` was redirected to a MemPurge, the function `DBImpl::NotifyOnFlushComplete` was called, which created a series of issues because the JobInfo was not correctly collected from the memtables.
      This diff aims at correcting these two issues (`FlushJobInfo` collection in `FlushJob::MemPurge` , no call to `DBImpl::NotifyOnFlushComplete` after successful mempurge).
      Event listeners were added to the unit tests to handle these situations.
      Surprisingly none of the crashtests caught this issue, I will try to add event listeners to crash tests in the future.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8672
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D30383109
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 35a8d4295886923ee4049a6447f00022cb221c73
      c625b8d0
    • M
      Allow Replayer to report the results of TraceRecords. (#8657) · d10801e9
      Merlin Mao 提交于
      Summary:
      `Replayer::Execute()` can directly returns the result (e.g, request latency, DB::Get() return code, returned value, etc.)
      `Replayer::Replay()` reports the results via a callback function.
      
      New interface:
      `TraceRecordResult` in "rocksdb/trace_record_result.h".
      
      `DBTest2.TraceAndReplay` and `DBTest2.TraceAndManualReplay` are updated accordingly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8657
      
      Reviewed By: ajkr
      
      Differential Revision: D30290216
      
      Pulled By: autopear
      
      fbshipit-source-id: 3c8d4e6b180ec743de1a9d9dcaee86064c74f0d6
      d10801e9
    • P
      Stable cache keys on ingested SST files (#8669) · b6269b07
      Peter Dillinger 提交于
      Summary:
      Extends https://github.com/facebook/rocksdb/issues/8659 to work for ingested external SST files, even
      the same file ingested into different DBs sharing a block cache.
      
      Note: These new cache keys are currently only enabled when FileSystem
      does not provide GetUniqueId. For now, they are typically larger,
      so slightly less efficient.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8669
      
      Test Plan: Extended unit test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30398532
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1f13e2af4b8bfff5741953a69466e9589fbc23c7
      b6269b07
  13. 18 8月, 2021 2 次提交
    • Y
      Fix bug caused by releasing snapshot(s) during compaction (#8608) · 2b367fa8
      Yanqin Jin 提交于
      Summary:
      In debug mode, we are seeing assertion failure as follows
      
      ```
      db/compaction/compaction_iterator.cc:980: void rocksdb::CompactionIterator::PrepareOutput(): \
      Assertion `ikey_.type != kTypeDeletion && ikey_.type != kTypeSingleDeletion' failed.
      ```
      
      It is caused by releasing earliest snapshot during compaction between the execution of
      `NextFromInput()` and `PrepareOutput()`.
      
      In one case, as demonstrated in unit test `WritePreparedTransaction.ReleaseEarliestSnapshotDuringCompaction_WithSD2`,
      incorrect result may be returned by a following range scan if we disable assertion, as in opt compilation
      level: the SingleDelete marker's sequence number is zeroed out, but the preceding PUT is also
      outputted to the SST file after compaction. Due to the logic of DBIter, the PUT will not be
      skipped and will be returned by iterator in range scan. https://github.com/facebook/rocksdb/issues/8661 illustrates what happened.
      
      Fix by taking a more conservative approach: make compaction zero out sequence number only
      if key is in the earliest snapshot when the compaction starts.
      
      Another assertion failure is
      ```
      Assertion `current_user_key_snapshot_ == last_snapshot' failed.
      ```
      
      It's caused by releasing the snapshot between the PUT and SingleDelete during compaction.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8608
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D30145645
      
      Pulled By: riversand963
      
      fbshipit-source-id: 699f58e66faf70732ad53810ccef43935d3bbe81
      2b367fa8
    • L
      Add statistics support to integrated BlobDB (#8667) · 6878cedc
      Levi Tamasi 提交于
      Summary:
      The patch adds statistics support to the integrated BlobDB implementation,
      namely the tickers `BLOB_DB_BLOB_FILE_BYTES_READ` and
      `BLOB_DB_GC_{NUM_KEYS,BYTES}_RELOCATED`, and the histograms
      `BLOB_DB_(DE)COMPRESSION_MICROS`. (Some other statistics, like
      `BLOB_DB_BLOB_FILE_BYTES_WRITTEN`, `BLOB_DB_BLOB_FILE_SYNCED`,
      `BLOB_DB_BLOB_FILE_{READ,WRITE,SYNC}_MICROS` were already supported.)
      Note that the vast majority of the old BlobDB's tickers/histograms are not
      really applicable to the new implementation, since they e.g. pertain to calling
      dedicated BlobDB APIs (which the integrated BlobDB does not have) or are
      tied to the legacy BlobDB's design of writing blob files synchronously when
      a write API is called. Such statistics are marked "legacy BlobDB only" in
      `statistics.h`.
      
      Fixes https://github.com/facebook/rocksdb/issues/8645 .
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8667
      
      Test Plan: Ran `make check` and tested the new statistics using `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D30356884
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 5f8a833faee60401c5643c2f0a6c0415488190a4
      6878cedc