1. 08 9月, 2021 1 次提交
  2. 05 9月, 2021 1 次提交
  3. 04 9月, 2021 2 次提交
  4. 03 9月, 2021 1 次提交
  5. 02 9月, 2021 13 次提交
  6. 01 9月, 2021 2 次提交
    • H
      Implement superior user & mid IO priority level in GenericRateLimiter (#8595) · 240c4126
      Hui Xiao 提交于
      Summary:
      Context:
      An extra IO_USER priority in rate limiter allows users to optionally charge WAL writes / SST reads to rate limiter at this priority level, which then has higher priority than IO_HIGH and IO_LOW. With an extra IO_USER priority, it allows users to better specify the relative urgency/importance among different requests in rate limiter. As a consequence, IO resource management can better prioritize and limit resource based on user's need.
      
      The IO_USER is implemented as superior priority in GenericRateLimiter, in the sense that its request queue will always be iterated first without being constrained to fairness. The reason is that the notion of fairness is only meaningful in helping lower priorities in background IO (i.e, IO_HIGH/MID/LOW) to gain some fair chance to run so that it does not block foreground IO (i.e, the ones that are charged at the level of IO_USER). As we can see, the ultimate goal here is to not blocking foreground IO at IO_USER level, which justifies the superiority of IO_USER.
      
      Similar benefits exist for IO_MID priority.
      - Rewrote the logic of deciding the order of iterating request queues of high/low priorities to include the extra user/mid priority w/o affecting the existing behavior (see PR's [comment](https://github.com/facebook/rocksdb/pull/8595/files#r678749331))
      - Included the request queue of user-pri/mid-pri in the code path of next-leader-candidate signaling and GenericRateLimiter's destructor
      - Included the extra user/mid-pri in bookkeeping data structures: total_bytes_through_ and total_requests_
      - Re-written the previous impl of explicitly iterating priorities with a loop from Env::IO_LOW to Env::IO_TOTAL
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8595
      
      Test Plan:
      - passed existing rate_limiter_test.cc
      - passed added unit tests in rate_limiter_test.cc
      - run performance test to verify performance with only high/low requests is not affected by this change
         - Set-up command:
         `TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom --duration=5 --compression_type=none --num=100000000 --disable_auto_compactions=true --write_buffer_size=1048576 --writable_file_max_buffer_size=65536 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --level0_slowdown_writes_trigger=$(((1 << 31) - 1)) --level0_stop_writes_trigger=$(((1 << 31) - 1))`
      
          - Test command:
         `TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=overwrite --use_existing_db=true --disable_wal=true --duration=30 --compression_type=none --num=100000000 --write_buffer_size=1048576 --writable_file_max_buffer_size=65536 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --level0_slowdown_writes_trigger=$(((1 << 31) - 1)) --level0_stop_writes_trigger=$(((1 << 31) - 1)) --statistics=true --rate_limiter_bytes_per_sec=1048576 --rate_limiter_refill_period_us=1000  --threads=32 |& grep -E '(flush|compact)\.write\.bytes'`
      
         - Before (on branch upstream/master):
         `rocksdb.compact.write.bytes COUNT : 4014162`
         `rocksdb.flush.write.bytes COUNT : 26715832`
          rocksdb.flush.write.bytes/rocksdb.compact.write.bytes ~= 6.66
      
         - After (on branch rate_limiter_user_pri):
        `rocksdb.compact.write.bytes COUNT : 3807822`
        `rocksdb.flush.write.bytes COUNT : 26098659`
         rocksdb.flush.write.bytes/rocksdb.compact.write.bytes ~= 6.85
      
      Reviewed By: ajkr
      
      Differential Revision: D30577783
      
      Pulled By: hx235
      
      fbshipit-source-id: 0881f2705ffd13ecd331256bde7e8ec874a353f4
      240c4126
    • Q
      Replace `std::shared_ptr<SystemClock>` by `SystemClock*` in `TraceExecutionHandler` (#8729) · 7b555546
      Qizhong Mao 提交于
      Summary:
      All/most trace related APIs directly use `SystemClock*` (https://github.com/facebook/rocksdb/pull/8033). Do the same in `TraceExecutionHandler`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8729
      
      Test Plan: None
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30672159
      
      Pulled By: autopear
      
      fbshipit-source-id: 017db4912c6ac1cfede842b8b122cf569a394f25
      7b555546
  7. 31 8月, 2021 2 次提交
    • A
      Fix a race in LRUCacheShard::Promote (#8717) · ec9f52ec
      anand76 提交于
      Summary:
      In ```LRUCacheShard::Promote```, a reference is released outside the LRU mutex. Fix the race condition.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8717
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30649206
      
      Pulled By: anand1976
      
      fbshipit-source-id: 09c0af05b2294a7fe2c02876a61b0bad6e3ada61
      ec9f52ec
    • P
      Built-in support for generating unique IDs, bug fix (#8708) · 13ded694
      Peter Dillinger 提交于
      Summary:
      Env::GenerateUniqueId() works fine on Windows and on POSIX
      where /proc/sys/kernel/random/uuid exists. Our other implementation is
      flawed and easily produces collision in a new multi-threaded test.
      As we rely more heavily on DB session ID uniqueness, this becomes a
      serious issue.
      
      This change combines several individually suitable entropy sources
      for reliable generation of random unique IDs, with goal of uniqueness
      and portability, not cryptographic strength nor maximum speed.
      
      Specifically:
      * Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
      rather than in Env implementation details. Callers are now told whether
      the operation fails or succeeds.
      * Adds an internal API GenerateRawUniqueId for generating high-quality
      128-bit unique identifiers, by combining entropy from three "tracks":
        * Lots of info from default Env like time, process id, and hostname.
        * std::random_device
        * port::GenerateRfcUuid (when working)
      * Built-in implementations of Env::GenerateUniqueId() will now always
      produce an RFC 4122 UUID string, either from platform-specific API or
      by converting the output of GenerateRawUniqueId.
      
      DB session IDs now use GenerateRawUniqueId while DB IDs (not as
      critical) try to use port::GenerateRfcUuid but fall back on
      GenerateRawUniqueId with conversion to an RFC 4122 UUID.
      
      GenerateRawUniqueId is declared and defined under env/ rather than util/
      or even port/ because of the Env dependency.
      
      Likely follow-up: enhance GenerateRawUniqueId to be faster after the
      first call and to guarantee uniqueness within the lifetime of a single
      process (imparting the same property onto DB session IDs).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708
      
      Test Plan:
      A new mini-stress test in env_test checks the various public
      and internal APIs for uniqueness, including each track of
      GenerateRawUniqueId individually. We can't hope to verify anywhere close
      to 128 bits of entropy, but it can at least detect flaws as bad as the
      old code. Serial execution of the new tests takes about 350 ms on
      my machine.
      
      Reviewed By: zhichao-cao, mrambacher
      
      Differential Revision: D30563780
      
      Pulled By: pdillinger
      
      fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
      13ded694
  8. 28 8月, 2021 3 次提交
  9. 27 8月, 2021 3 次提交
  10. 26 8月, 2021 2 次提交
  11. 25 8月, 2021 6 次提交
    • Y
      Temporarily disable block-based filter when stress testing timestamp (#8703) · d8eb8243
      Yanqin Jin 提交于
      Summary:
      Current implementation does not support user-defined timestamp when
      block-based filter is used. Will implement the support in the future, or
      wait to see if block-based filter can be deprecated and removed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8703
      
      Test Plan: make whitebox_crash_test_with_ts
      
      Reviewed By: pdillinger
      
      Differential Revision: D30528931
      
      Pulled By: riversand963
      
      fbshipit-source-id: 60dd74ee0a6194e69072069d8c4bd876f249f38d
      d8eb8243
    • Y
      Fix a bug of secondary instance sequence going backward (#8653) · f235f4b0
      Yanqin Jin 提交于
      Summary:
      Recent refactor of `ReactiveVersionSet::ReadAndApply()` uses
      `ManifestTailer` whose `Iterate()` method can cause the db's
      `last_sequence_` to go backward. Consequently, read requests can see
      out-dated data. For example, latest changes to the primary will not be
      seen on the secondary even after a `TryCatchUpWithPrimary()` if no new
      write batches are read from the WALs and no new MANIFEST entries are
      read from the MANIFEST.
      
      Fix the bug so that `VersionEditHandler::CheckIterationResult` will
      never decrease `last_sequence_`, `last_allocated_sequence_` and
      `last_published_sequence_`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8653
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D30272084
      
      Pulled By: riversand963
      
      fbshipit-source-id: c6a49c534b2509b93ef62d8936ed0acd5b860eaa
      f235f4b0
    • M
      Simplify `TraceAnalyzer` (#8697) · 785faf2d
      Merlin Mao 提交于
      Summary:
      Handler functions now use a common output function to output to stdout/files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8697
      
      Test Plan: `trace_analyzer_test` can pass.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30527696
      
      Pulled By: autopear
      
      fbshipit-source-id: c626cf4d53a39665a9c4bcf0cb019c448434abe4
      785faf2d
    • P
      Add port::GetProcessID() (#8693) · 318fe694
      Peter Dillinger 提交于
      Summary:
      Useful in some places for object uniqueness across processes.
      Currently used for generating a host-wide identifier of Cache objects
      but expected to be used soon in some unique id generation code.
      
      `int64_t` is chosen for return type because POSIX uses signed integer type,
      usually `int`, for `pid_t` and Windows uses `DWORD`, which is `uint32_t`.
      
      Future work: avoid copy-pasted declarations in port_*.h, perhaps with
      port_common.h always included from port.h
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8693
      
      Test Plan: manual for now
      
      Reviewed By: ajkr, anand1976
      
      Differential Revision: D30492876
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 39fc2788623cc9f4787866bdb67a4d183dde7eef
      318fe694
    • Y
      Allow iterate refresh for secondary instance (#8700) · 229350ef
      Yanqin Jin 提交于
      Summary:
      Test plan
      make check
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8700
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30523907
      
      Pulled By: riversand963
      
      fbshipit-source-id: 68928ab4dafb64ce80ab7bc69d83727a4713ab91
      229350ef
    • H
      Refactor WriteBufferManager::CacheRep into CacheReservationManager (#8506) · 74cfe7db
      Hui Xiao 提交于
      Summary:
      Context:
      To help cap various memory usage by a single limit of the block cache capacity, we charge the memory usage through inserting/releasing dummy entries in the block cache. CacheReservationManager is such a class (non thread-safe) responsible for  inserting/removing dummy entries to reserve cache space for memory used by the class user.
      
      - Refactored the inner private class CacheRep of WriteBufferManager into public CacheReservationManager class for reusability such as for https://github.com/facebook/rocksdb/pull/8428
      
      - Encapsulated implementation details of cache key generation and dummy entries insertion/release in cache reservation as discussed in https://github.com/facebook/rocksdb/pull/8506#discussion_r666550838
      
      - Consolidated increase/decrease cache reservation into one API - UpdateCacheReservation.
      
      - Adjusted the previous dummy entry release algorithm in decreasing cache reservation to be loop-releasing dummy entries to stay symmetric to dummy entry insertion algorithm
      
      - Made the previous dummy entry release algorithm in delayed decrease mode more aggressive for better decreasing cache reservation when memory used is less likely to increase back.
      
        Previously, the algorithms only release 1 dummy entries when new_mem_used < 3/4 * cache_allocated_size_ and cache_allocated_size_ - kSizeDummyEntry > new_mem_used.
      Now, the algorithms loop-releases as many dummy entries as possible when new_mem_used < 3/4 * cache_allocated_size_.
      
      - Updated WriteBufferManager's test cases to adapt to changes on the release algorithm mentioned above and left comment for some test cases for clarity
      
      - Replaced the previous cache key prefix generation (utilizing object address related to the cache client) with one that utilizes Cache->NewID() to prevent cache-key collision among dummy entry clients sharing the same cache.
      
        The specific collision we are preventing happens when the object address is reused for a new cache-key prefix while the old cache-key using that same object address in its prefix still exists in the cache. This could happen due to that, under LRU cache policy, there is a possible delay in releasing a cache entry after the cache client object owning that cache entry get deallocated. In this case, the object address related to the cache client object can get reused for other client object to generate a new cache-key prefix.
      
        This prefix generation can be made obsolete after Peter's unification of all the code generating cache key, mentioned in https://github.com/facebook/rocksdb/pull/8506#discussion_r667265255
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8506
      
      Test Plan:
      - Passing the added unit tests cache_reservation_manager_test.cc
      - Passing existing and adjusted write_buffer_manager_test.cc
      
      Reviewed By: ajkr
      
      Differential Revision: D29644135
      
      Pulled By: hx235
      
      fbshipit-source-id: 0fc93fbfe4a40bb41be85c314f8f2bafa8b741f7
      74cfe7db
  12. 24 8月, 2021 4 次提交
    • A
      Deflake write-prepared and write-unprepared tests (#8696) · c521f22a
      Andrew Kryczka 提交于
      Summary:
      The `JobContext::job_snapshot` referenced DB state but could
      have been deleted by a BG thread after the signal/unlock allowing
      shutdown to proceed. Then we would see an error like this (valgrind):
      
      ```
      ==354104== Thread 2:
      ==354104== Invalid read of size 8
      ==354104==    at 0x694C4D: rocksdb::ManagedSnapshot::~ManagedSnapshot() (snapshot_impl.cc:20)
      ==354104==    by 0x58F5BA: operator() (unique_ptr.h:81)
      ==354104==    by 0x58F5BA: operator() (unique_ptr.h:75)
      ==354104==    by 0x58F5BA: ~unique_ptr (unique_ptr.h:292)
      ==354104==    by 0x58F5BA: rocksdb::JobContext::~JobContext() (job_context.h:221)
      ==354104==    by 0x5F155E: rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) (db_impl_compaction_flush.cc:2696)
      ==354104==    by 0x5F1BC2: rocksdb::DBImpl::BGWorkCompaction(void*) (db_impl_compaction_flush.cc:2468)
      ==354104==    by 0x83707A: operator() (std_function.h:688)
      ==354104==    by 0x83707A: rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) (threadpool_imp.cc:266)
      ==354104==    by 0x8373ED: rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) (threadpool_imp.cc:307)
      ==354104==    by 0x492A800: execute_native_thread_routine (in /usr/local/fbcode/platform009/lib/libstdc++.so.6.0.28)
      ==354104==    by 0x4A5020B: start_thread (in /usr/local/fbcode/platform009/lib/libpthread-2.30.so)
      ==354104==    by 0x4CF281E: clone (in /usr/local/fbcode/platform009/lib/libc-2.30.so)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8696
      
      Test Plan: unable to repro
      
      Reviewed By: pdillinger
      
      Differential Revision: D30505277
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5a99f34137cd14d06b0f624add6d37a70a61135d
      c521f22a
    • M
      Refactor TraceAnalyzer to use `TraceRecord::Handler` to avoid casting. (#8678) · f6437ea4
      Merlin Mao 提交于
      Summary:
      `TraceAnalyzer` privately inherits `TraceRecord::Handler` and `WriteBatch::Handler`.
      
      `trace_analyzer_test` can pass with this change.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8678
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30459814
      
      Pulled By: autopear
      
      fbshipit-source-id: a27f59ac4600f7c3682830c9b1d9dc79e53425be
      f6437ea4
    • J
      Add extra information to RemoteCompaction APIs (#8680) · 249b1078
      Jay Zhuang 提交于
      Summary:
      Currently, we only provide job_id in RemoteCompaction APIs, the
      main problem of `job_id` is it cannot uniquely identify a compaction job
      between DB instances or between sessions.
      Providing DB and session id to the user, which will make building cross
      DB compaction service easier.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8680
      
      Test Plan: unittest
      
      Reviewed By: ajkr
      
      Differential Revision: D30444859
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: fdf107f4286564049637f154193c6d94c3c59448
      249b1078
    • P
      Allow intentionally swallowed errors in BlockBasedFilterBlockReader (#8695) · 1a5eb33d
      Peter Dillinger 提交于
      Summary:
      To avoid getting "Didn't get expected error from Get" from
      crash test by enabling block-based filter in crash test in https://github.com/facebook/rocksdb/issues/8679.
      Basically, this applies the pattern of IGNORE_STATUS_IF_ERROR in
      full_filter_block.cc to block_based_filter_block.cc
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8695
      
      Test Plan: watch for resolution of crash test runs
      
      Reviewed By: ltamasi
      
      Differential Revision: D30496748
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f7808fcf14c0e787fe81da03fa8303244590d273
      1a5eb33d