1. 22 6月, 2022 1 次提交
  2. 21 6月, 2022 6 次提交
  3. 20 6月, 2022 2 次提交
  4. 19 6月, 2022 1 次提交
  5. 18 6月, 2022 5 次提交
    • A
      Fix race condition with WAL tracking and `FlushWAL(true /* sync */)` (#10185) · d5d8920f
      Andrew Kryczka 提交于
      Summary:
      `FlushWAL(true /* sync */)` is used internally and for manual WAL sync. It had a bug when used together with `track_and_verify_wals_in_manifest` where the synced size tracked in MANIFEST was larger than the number of bytes actually synced.
      
      The bug could be repro'd almost immediately with the following crash test command: `python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=524288 --max_bytes_for_level_base=2097152 --target_file_size_base=524288 --duration=3600 --interval=10 --sync_fault_injection=1 --disable_wal=0 --checkpoint_one_in=1000 --max_key=10000 --value_size_mult=33`.
      
      An example error message produced by the above command is shown below. The error sometimes arose from the checkpoint and other times arose from the main stress test DB.
      
      ```
      Corruption: Size mismatch: WAL (log number: 119) in MANIFEST is 27938 bytes , but actually is 27859 bytes on disk.
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10185
      
      Test Plan:
      - repro unit test
      - the above crash test command no longer finds the error. It does find a different error after a while longer such as "Corruption: WAL file 481 required by manifest but not in directory list"
      
      Reviewed By: riversand963
      
      Differential Revision: D37200993
      
      Pulled By: ajkr
      
      fbshipit-source-id: 98e0071c1a89f4d009888512ed89f9219779ae5f
      d5d8920f
    • H
      Add rate-limiting support to batched MultiGet() (#10159) · a5d773e0
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      https://github.com/facebook/rocksdb/pull/9424 added rate-limiting support for user reads, which does not include batched `MultiGet()`s that call `RandomAccessFileReader::MultiRead()`. The reason is that it's harder (compared with RandomAccessFileReader::Read()) to implement the ideal rate-limiting where we first call `RateLimiter::RequestToken()` for allowed bytes to multi-read and then consume those bytes by satisfying as many requests in `MultiRead()` as possible. For example, it can be tricky to decide whether we want partially fulfilled requests within one `MultiRead()` or not.
      
      However, due to a recent urgent user request, we decide to pursue an elementary (but a conditionally ineffective) solution where we accumulate enough rate limiter requests toward the total bytes needed by one `MultiRead()` before doing that `MultiRead()`. This is not ideal when the total bytes are huge as we will actually consume a huge bandwidth from rate-limiter causing a burst on disk. This is not what we ultimately want with rate limiter. Therefore a follow-up work is noted through TODO comments.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10159
      
      Test Plan:
      - Modified existing unit test `DBRateLimiterOnReadTest/DBRateLimiterOnReadTest.NewMultiGet`
      - Traced the underlying system calls `io_uring_enter` and verified they are 10 seconds apart from each other correctly under the setting of  `strace -ftt -e trace=io_uring_enter ./db_bench -benchmarks=multireadrandom -db=/dev/shm/testdb2 -readonly -num=50 -threads=1 -multiread_batched=1 -batch_size=100 -duration=10 -rate_limiter_bytes_per_sec=200 -rate_limiter_refill_period_us=1000000 -rate_limit_bg_reads=1 -disable_auto_compactions=1 -rate_limit_user_ops=1` where each `MultiRead()` read about 2000 bytes (inspected by debugger) and the rate limiter grants 200 bytes per seconds.
      - Stress test:
         - Verified `./db_stress (-test_cf_consistency=1/test_batches_snapshots=1) -use_multiget=1 -cache_size=1048576 -rate_limiter_bytes_per_sec=10241024 -rate_limit_bg_reads=1 -rate_limit_user_ops=1` work
      
      Reviewed By: ajkr, anand1976
      
      Differential Revision: D37135172
      
      Pulled By: hx235
      
      fbshipit-source-id: 73b8e8f14761e5d4b77235dfe5d41f4eea968bcd
      a5d773e0
    • G
      Read blob from blob cache if exists when GetBlob() (#10178) · c965c9ef
      Gang Liao 提交于
      Summary:
      There is currently no caching mechanism for blobs, which is not ideal especially when the database resides on remote storage (where we cannot rely on the OS page cache). As part of this task, we would like to make it possible for the application to configure a blob cache.
      In this task, we added a new abstraction layer `BlobSource` to retrieve blobs from either blob cache or raw blob file. Note: For simplicity, the current PR only includes `GetBlob()`.  `MultiGetBlob()` will be included in the next PR.
      
      This PR is a part of https://github.com/facebook/rocksdb/issues/10156
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10178
      
      Reviewed By: ltamasi
      
      Differential Revision: D37250507
      
      Pulled By: gangliao
      
      fbshipit-source-id: 3fc4a55a0cea955a3147bdc7dba06430e377259b
      c965c9ef
    • P
      Use optimized folly DistributedMutex in LRUCache when available (#10179) · 1aac8145
      Peter Dillinger 提交于
      Summary:
      folly DistributedMutex is faster than standard mutexes though
      imposes some static obligations on usage. See
      https://github.com/facebook/folly/blob/main/folly/synchronization/DistributedMutex.h
      for details. Here we use this alternative for our Cache implementations
      (especially LRUCache) for better locking performance, when RocksDB is
      compiled with folly.
      
      Also added information about which distributed mutex implementation is
      being used to cache_bench output and to DB LOG.
      
      Intended follow-up:
      * Use DMutex in more places, perhaps improving API to support non-scoped
      locking
      * Fix linking with fbcode compiler (needs ROCKSDB_NO_FBCODE=1 currently)
      
      Credit: Thanks Siying for reminding me about this line of work that was previously
      left unfinished.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10179
      
      Test Plan:
      for correctness, existing tests. CircleCI config updated.
      Also Meta-internal buck build updated.
      
      For performance, ran simultaneous before & after cache_bench. Out of three
      comparison runs, the middle improvement to ops/sec was +21%:
      
      Baseline: USE_CLANG=1 DEBUG_LEVEL=0 make -j24 cache_bench (fbcode
      compiler)
      
      ```
      Complete in 20.201 s; Rough parallel ops/sec = 1584062
      Thread ops/sec = 107176
      
      Operation latency (ns):
      Count: 32000000 Average: 9257.9421  StdDev: 122412.04
      Min: 134  Median: 3623.0493  Max: 56918500
      Percentiles: P50: 3623.05 P75: 10288.02 P99: 30219.35 P99.9: 683522.04 P99.99: 7302791.63
      ```
      
      New: (add USE_FOLLY=1)
      
      ```
      Complete in 16.674 s; Rough parallel ops/sec = 1919135  (+21%)
      Thread ops/sec = 135487
      
      Operation latency (ns):
      Count: 32000000 Average: 7304.9294  StdDev: 108530.28
      Min: 132  Median: 3777.6012  Max: 91030902
      Percentiles: P50: 3777.60 P75: 10169.89 P99: 24504.51 P99.9: 59721.59 P99.99: 1861151.83
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D37182983
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a17eb05f25b832b6a2c1356f5c657e831a5af8d1
      1aac8145
    • P
      Fix overflow in ribbon_bench after #10184 (#10195) · f87adcfb
      Peter Dillinger 提交于
      Summary:
      Ribbon micro-bench needs updating after re-numbering
      `BloomLikeFilterPolicy::GetAllFixedImpls()` entries. (CircleCI nightly
      failure.)
      
      Also fixed memory leaks while using ASAN to validate my fix. (I assume
      the leaks weren't intentional for some performance characteristic.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10195
      
      Test Plan: run with ASAN
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D37244459
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 5a363e10de3c4c9c88099c937e3dc3b4cf24fd30
      f87adcfb
  6. 17 6月, 2022 6 次提交
    • A
      Add WriteOptions::protection_bytes_per_key (#10037) · 5d6005c7
      Andrew Kryczka 提交于
      Summary:
      Added an option, `WriteOptions::protection_bytes_per_key`, that controls how many bytes per key we use for integrity protection in `WriteBatch`. It takes effect when `WriteBatch::GetProtectionBytesPerKey() == 0`.
      
      Currently the only supported value is eight. Invoking a user API with it set to any other nonzero value will result in `Status::NotSupported` returned to the user.
      
      There is also a bug fix for integrity protection with `inplace_callback`, where we forgot to take into account the possible change in varint length when calculating KV checksum for the final encoded buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10037
      
      Test Plan:
      - Manual
        - Set default value of `WriteOptions::protection_bytes_per_key` to eight and ran `make check -j24`
        - Enabled in MyShadow for 1+ week
      - Automated
        - Unit tests have a `WriteMode` that enables the integrity protection via `WriteOptions`
        - Crash test - in most cases, use `WriteOptions::protection_bytes_per_key` to enable integrity protection
      
      Reviewed By: cbi42
      
      Differential Revision: D36614569
      
      Pulled By: ajkr
      
      fbshipit-source-id: 8650087ceac9b61b560f1e5fafe5e1baf9c725fb
      5d6005c7
    • P
      Fix a false negative merge conflict (#10192) · f62c1e1e
      Peter Dillinger 提交于
      Summary:
      .. between https://github.com/facebook/rocksdb/issues/10184 and https://github.com/facebook/rocksdb/issues/10122 not detected by source control,
      leading to non-compiling code.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10192
      
      Test Plan: updated test
      
      Reviewed By: hx235
      
      Differential Revision: D37231921
      
      Pulled By: pdillinger
      
      fbshipit-source-id: fa21488716f4c006b111b8c4127d71c757c935c3
      f62c1e1e
    • C
      Update HISTORY.md for #10114 (#10189) · 8cf86258
      Changyu Bi 提交于
      Summary:
      Update HISTORY.md for https://github.com/facebook/rocksdb/issues/10114: write batch checksum verification before writing to WAL.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10189
      
      Reviewed By: ajkr
      
      Differential Revision: D37226366
      
      Pulled By: cbi42
      
      fbshipit-source-id: cd2f076961abc35f35783e0f2cc3beda68cdb446
      8cf86258
    • P
      More testing w/prefix extractor, small refactor (#10122) · fff302d9
      Peter Dillinger 提交于
      Summary:
      There was an interesting code path not covered by testing that
      is difficult to replicate in a unit test, which is now covered using a
      sync point. Specifically, the case of table_prefix_extractor == null and
      !need_upper_bound_check in `BlockBasedTable::PrefixMayMatch`, which
      can happen if table reader is open before extractor is registered with global
      object registry, but is later registered and re-set with SetOptions. (We
      don't have sufficient testing control over object registry to set that up
      repeatedly.)
      
      Also, this function has been renamed to `PrefixRangeMayMatch` for clarity
      vs. other functions that are not the same.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10122
      
      Test Plan: unit tests expanded
      
      Reviewed By: siying
      
      Differential Revision: D36944834
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 9e52d9da1929a3e42bbc230fcdc3599949de7bdb
      fff302d9
    • P
      Remove deprecated block-based filter (#10184) · 126c2237
      Peter Dillinger 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/9535, release 7.0, we hid the old block-based filter from being created using
      the public API, because of its inefficiency. Although we normally maintain read compatibility
      on old DBs forever, filters are not required for reading a DB, only for optimizing read
      performance. Thus, it should be acceptable to remove this code and the substantial
      maintenance burden it carries as useful features are developed and validated (such
      as user timestamp).
      
      This change completely removes the code for reading and writing the old block-based
      filters, net removing about 1370 lines of code no longer needed. Options removed from
      testing / benchmarking tools. The prior existence is only evident in a couple of places:
      * `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in
      a major release to minimize source code incompatibilities.
      * A warning is logged when an old table file is opened that used the old block-based
      filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing
      should suffice. Unfortunately, sst_dump does not tell you whether a file uses
      block-based filter, and the structure of the code makes it very difficult to fix.
      * To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`)
      for metaindex is maintained (for now).
      
      Other notes:
      * In some cases where numbers are associated with filter configurations, we have had to
      update the assigned numbers so that they all correspond to something that exists.
      * Fixed potential stat counting bug by assuming `filter_checked = false` for cases
      like `filter == nullptr` rather than assuming `filter_checked = true`
      * Removed obsolete `block_offset` and `prefix_extractor` parameters from several
      functions.
      * Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)`
      because the caller guarantees the prefix extractor exists and is compatible
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10184
      
      Test Plan:
      tests updated, manually test new warning in LOG using base version to
      generate a DB
      
      Reviewed By: riversand963
      
      Differential Revision: D37212647
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80
      126c2237
    • A
      Update stats to help users estimate MultiGet async IO impact (#10182) · a6691d0f
      anand76 提交于
      Summary:
      Add a couple of stats to help users estimate the impact of potential MultiGet perf improvements -
      1. NUM_LEVEL_READ_PER_MULTIGET - A histogram stat for number of levels that required MultiGet to read from a file
      2. MULTIGET_COROUTINE_COUNT - A ticker stat to count the number of times the coroutine version of MultiGetFromSST was used
      
      The NUM_DATA_BLOCKS_READ_PER_LEVEL stat is obsoleted as it doesn't provide useful information for MultiGet optimization.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10182
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D37213296
      
      Pulled By: anand1976
      
      fbshipit-source-id: 5d2b7708017c0e278578ae4bffac3926f6530efb
      a6691d0f
  7. 16 6月, 2022 9 次提交
    • Y
      Abort in dbg mode after logging (#10183) · 4d31d3c2
      Yanqin Jin 提交于
      Summary:
      In CompactionIterator code, there are multiple places where the process
      will abort in dbg mode before logging the error message describing the
      cause. This PR changes only the logging behavior for compaction iterator so
      that error message is written to LOG before the process aborts in debug
      mode.
      
      Also updated the triggering condition for an assertion for single delete with
      user-defined timestamp.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10183
      
      Test Plan: make check
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D37190218
      
      Pulled By: riversand963
      
      fbshipit-source-id: 741bb007067be7cfbe94ac9e530ad4b2b339c009
      4d31d3c2
    • A
      Add few optimizations in async_io for short scans (#10140) · 8353ae8b
      Akanksha Mahajan 提交于
      Summary:
      This PR adds few optimizations for async_io for shorter scans.
      1.  If async_io is enabled, seek would create FilePrefetchBuffer object to fetch the data asynchronously. However `FilePrefetchbuffer::num_file_reads_` wasn't taken into consideration if it calls Next after Seek and would go for Prefetching.  This PR fixes that and Next will go for prefetching only if `FilePrefetchbuffer::num_file_reads_` is greater than 2 along with if blocks are sequential. This scenario is only for implicit auto readahead.
      2. For seek, when it calls TryReadFromCacheAsync to poll it makes async call as well because TryReadFromCacheAsync flow wasn't changed. So I updated to return after poll instead of further prefetching any data.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10140
      
      Test Plan:
      1. Added a unit test
                        2. Ran crash_test with async_io = 1 to make sure nothing crashes.
      
      Reviewed By: anand1976
      
      Differential Revision: D37042242
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: b8e6b7cb2ee0886f37a8f53951948b9084e8ffda
      8353ae8b
    • P
      Fix handling of accidental truncation of IDENTITY file (#10173) · 3d358a7e
      Peter Dillinger 提交于
      Summary:
      A consequence of https://github.com/facebook/rocksdb/issues/9990 was requiring a non-empty DB ID to generate
      new SST files. But if the DB ID is not tracked in the manifest and the IDENTITY file
      is somehow truncated to 0 bytes, then an empty DB ID would be assigned, leading
      to crash. This change ensures a non-empty DB ID is assigned and set in the
      IDENTITY file.
      
      Also,
      * Some light refactoring to clean up the logic
      * (I/O efficiency) If the ID is tracked in the manifest and already matches the
      IDENTITY file, don't needlessly overwrite the file.
      * (Debugging) Log the DB ID to info log on open, because sometimes IDENTITY
      can change if DB is moved around (though it would be unusual for info log to
      be copied/moved without IDENTITY file)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10173
      
      Test Plan: unit tests expanded/updated
      
      Reviewed By: ajkr
      
      Differential Revision: D37176545
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a9b414cd35bfa33de48af322a36c24538d50bef1
      3d358a7e
    • P
      Use only ASCII in source files (#10164) · 94329ae4
      Peter Dillinger 提交于
      Summary:
      Fix existing usage of non-ASCII and add a check to prevent
      future use. Added `-n` option to greps to provide line numbers.
      
      Alternative to https://github.com/facebook/rocksdb/issues/10147
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10164
      
      Test Plan:
      used new checker to find & fix cases, manually check
      db_bench output is preserved
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D37148792
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 68c8b57e7ab829369540d532590bf756938855c7
      94329ae4
    • C
      Verify write batch checksum before WAL (#10114) · 9882652b
      Changyu Bi 提交于
      Summary:
      Context: WriteBatch can have key-value checksums when it was created `with protection_bytes_per_key > 0`.
      This PR added checksum verification for write batches before they are written to WAL.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10114
      
      Test Plan:
      - Added new unit tests to db_kv_checksum_test.cc: `make check -j32`
      - benchmark on performance regression: `./db_bench --benchmarks=fillrandom[-X20] -db=/dev/shm/test_rocksdb -write_batch_protection_bytes_per_key=8`
        - Pre-PR:
      `
      fillrandom [AVG    20 runs] : 198875 (± 3006) ops/sec;   22.0 (± 0.3) MB/sec
      `
        - Post-PR:
      `
      fillrandom [AVG    20 runs] : 196487 (± 2279) ops/sec;   21.7 (± 0.3) MB/sec
      `
        Mean regressed about 1% (198875 -> 196487 ops/sec).
      
      Reviewed By: ajkr
      
      Differential Revision: D36917464
      
      Pulled By: cbi42
      
      fbshipit-source-id: 29beb74edf65f04b1a890b4f650d873dc7ed790d
      9882652b
    • A
      Change the instruction used for a pause on arm64 (#10118) · 2e5a323d
      Ali Saidi 提交于
      Summary:
      While the yield instruction conseptually sounds correct on most platforms it is
      a simple nop that doesn't delay the execution anywhere close to what an x86
      pause instruction does. In other projects with spin-wait loops an isb has been
      observed to be much closer to the x86 behavior.
      
      On a Graviton3 system the following test improves on average by 2x with this
      change averaged over 20 runs:
      
      ```
      ./db_bench  -benchmarks=fillrandom -threads=64 -batch_size=1
      -memtablerep=skip_list -value_size=100 --num=100000
      level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
      -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8
      --disable_wal --write_buffer_size=160000000 --block_size=16384
      --allow_concurrent_memtable_write -compression_type none
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10118
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D37120578
      
      fbshipit-source-id: c20bde4298222edfab7ff7cb6d42497e7012400d
      2e5a323d
    • S
      Use madvise() for mmaped file advise (#10170) · 69a32eec
      sdong 提交于
      Summary:
      A recent PR https://github.com/facebook/rocksdb/pull/10142 enabled fadvise for mmaped file. However, we were told that it might not take effective and madvise() should be used.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10170
      
      Test Plan:
      Run existing tests
      Run a benchmark using mmap with advise random and see I/O size is indeed small.
      
      Reviewed By: anand1976
      
      Differential Revision: D37158582
      
      fbshipit-source-id: 8b3a74f0e89d2e16aac78ee4124c05841d4135c3
      69a32eec
    • Y
      Allow db_bench and db_stress to set `allow_data_in_errors` (#10171) · ce419c0f
      Yanqin Jin 提交于
      Summary:
      There is `Options::allow_data_in_errors` that controls whether RocksDB
      is allowed to log data, e.g. key, value, etc in LOG files. It is false
      by default. However, in db_bench and db_stress, it is often ok to log
      data because there is no concern about privacy.
      
      This PR allows db_stress and db_bench to set this option on the command
      line, while it remains false by default. Furthermore, make
      crash/recovery test driven by db_crashtest.py to opt-in.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10171
      
      Test Plan: Stress test and db_bench
      
      Reviewed By: hx235
      
      Differential Revision: D37163787
      
      Pulled By: riversand963
      
      fbshipit-source-id: 0242f24d292ba15b6faf8ff903963b85d3e011f8
      ce419c0f
    • A
      fix cancel argument for latest liburing (#10168) · 19345de6
      Akanksha Mahajan 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10168
      
      the arg changed to u64
      
      Reviewed By: ajkr
      
      Differential Revision: D37155407
      
      fbshipit-source-id: 464eab2806675f148fce075a6fea369fa3d7a9bb
      19345de6
  8. 15 6月, 2022 8 次提交
    • I
      Fix C4702 on windows (#10146) · 40dfa260
      iseki 提交于
      Summary:
      This code is unreachable when `ROCKSDB_LITE` not defined. And it cause build fail on my environment VS2019 16.11.15.
      ```
      -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
      -- The CXX compiler identification is MSVC 19.29.30145.0
      -- The C compiler identification is MSVC 19.29.30145.0
      -- The ASM compiler identification is MSVC
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10146
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D37112916
      
      Pulled By: ajkr
      
      fbshipit-source-id: e0b2bf3055d6fac1b3fb40b9f02c4cbae3f82757
      40dfa260
    • M
      Fix potential leak when reusing PinnableSlice instances. (#10166) · 77f47995
      mpoeter 提交于
      Summary:
      `PinnableSlice` may hold a handle to a cache value which must be released to correctly decrement the ref-counter. However, when `PinnableSlice` variables are reused, e.g. like this:
      ```
      PinnableSlice pin_slice;
      db.Get("foo", &pin_slice);
      db.Get("foo", &pin_slice);
      ```
      then the second `Get` simply overwrites the old value in `pin_slice` and the handle returned by the first `Get` is _not_ released.
      
      This PR adds `Reset` calls to the `Get`/`MultiGet` calls that accept `PinnableSlice` arguments to ensure proper cleanup of old values.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10166
      
      Reviewed By: hx235
      
      Differential Revision: D37151632
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9dd3c3288300f560531b843f67db11aeb569a9ff
      77f47995
    • A
      Modify the instructions emited for PREFETCH on arm64 (#10117) · b550fc0b
      Ali Saidi 提交于
      Summary:
      __builtin_prefetch(...., 1) prefetches into the L2 cache on x86 while the same
      emits a pldl3keep instruction on arm64 which doesn't seem to be close enough.
      
      Testing on a Graviton3, and M1 system with memtablerep_bench fillrandom and
      skiplist througpuh increased as follows adjusting the 1 to 2 or 3:
      ```
                 1 -> 2     1 -> 3
      ----------------------------
      Graviton3   +10%        +15%
      M1          +10%        +10%
      ```
      
      Given that prefetching into the L1 cache seems to help, I chose that conversion
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10117
      
      Reviewed By: pdillinger
      
      Differential Revision: D37120475
      
      fbshipit-source-id: db1ef43f941445019c68316500a2250acc643d5e
      b550fc0b
    • J
      mingw: remove no-asynchronous-unwind-tables (#9963) · 751d1a3e
      James Tucker 提交于
      Summary:
      This default is generally incompatible with other parts of mingw, and
      can be applied by outside users as-needed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9963
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D36302813
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9456b41a96bde302bacbc39e092ccecfcb42f34f
      751d1a3e
    • G
      Add blob cache option in the column family options (#10155) · cba398df
      Gang Liao 提交于
      Summary:
      There is currently no caching mechanism for blobs, which is not ideal especially when the database resides on remote storage (where we cannot rely on the OS page cache). As part of this task, we would like to make it possible for the application to configure a blob cache.
      This PR is a part of https://github.com/facebook/rocksdb/issues/10156
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10155
      
      Reviewed By: ltamasi
      
      Differential Revision: D37150819
      
      Pulled By: gangliao
      
      fbshipit-source-id: b807c7916ea5d411588128f8e22a49f171388fe2
      cba398df
    • T
      fix a false positive case of parsing table factory from options file (#10094) · 1d2950b8
      tabokie 提交于
      Summary:
      During options file parsing, reset table factory before attempting to parse it
      from string. This avoids mistakenly treating the default table factory as a
      newly created one.
      Signed-off-by: Ntabokie <xy.tao@outlook.com>
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10094
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D36945378
      
      Pulled By: ajkr
      
      fbshipit-source-id: 94b2604e5e87682063b4b78f6370f3e8f101dc44
      1d2950b8
    • H
      Account memory of FileMetaData in global memory limit (#9924) · d665afdb
      Hui Xiao 提交于
      Summary:
      **Context/Summary:**
      As revealed by heap profiling, allocation of `FileMetaData` for [newly created file added to a Version](https://github.com/facebook/rocksdb/pull/9924/files#diff-a6aa385940793f95a2c5b39cc670bd440c4547fa54fd44622f756382d5e47e43R774) can consume significant heap memory. This PR is to account that toward our global memory limit based on block cache capacity.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9924
      
      Test Plan:
      - Previous `make check` verified there are only 2 places where the memory of  the allocated `FileMetaData` can be released
      - New unit test `TEST_P(ChargeFileMetadataTestWithParam, Basic)`
      - db bench (CPU cost of `charge_file_metadata` in write and compact)
         - **write micros/op: -0.24%** : `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_file_metadata=1 (remove this option for pre-PR) -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'`
         - **compact micros/op -0.87%** : `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_file_metadata=1 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 -numdistinct=1000 && ./db_bench -benchmarks=compact -db=$TEST_TMPDIR -use_existing_db=1 -charge_file_metadata=1 -disable_auto_compactions=1 | egrep 'compact'`
      
      table 1 - write
      
      #-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
      -- | -- | -- | -- | -- | --
      10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721
      20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | -0.3633711465
      40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | 0.5289363078
      80 | 3.87828 | 0.119007 | 3.86791 | 0.115674 | **-0.2673865734**
      160 | 3.87677 | 0.162231 | 3.86739 | 0.16663 | **-0.2419539978**
      
      table 2 - compact
      
      #-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
      -- | -- | -- | -- | -- | --
      10 | 2,399,650.00 | 96,375.80 | 2,359,537.00 | 53,243.60 | -1.67
      20 | 2,410,480.00 | 89,988.00 | 2,433,580.00 | 91,121.20 | 0.96
      40 | 2.41E+06 | 121811 | 2.39E+06 | 131525 | **-0.96**
      80 | 2.40E+06 | 134503 | 2.39E+06 | 108799 | **-0.78**
      
      - stress test: `python3 tools/db_crashtest.py blackbox --charge_file_metadata=1  --cache_size=1` killed as normal
      
      Reviewed By: ajkr
      
      Differential Revision: D36055583
      
      Pulled By: hx235
      
      fbshipit-source-id: b60eab94707103cb1322cf815f05810ef0232625
      d665afdb
    • A
      Fix the failure related to io_uring_prep_cancel (#10165) · 40d19bc1
      Akanksha Mahajan 提交于
      Summary:
      Fix for Internal jobs are failing with
      ```
       error: no matching function for call to 'io_uring_prep_cancel'
            io_uring_prep_cancel(sqe, posix_handle, 0);
            ^~~~~~~~~~~~~~~~~~~~
      note: candidate function not viable: no known conversion from 'rocksdb::Posix_IOHandle *' to '__u64' (aka 'unsigned long long') for 2nd argument
      static inline void io_uring_prep_cancel(struct io_uring_sqe *sqe,
      ```
      
      User data is set using `io_uring_set_data` API so no need to pass posix_handle here.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10165
      
      Test Plan: CircleCI jobs
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D37145233
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 05da650e1240e9c6fcc8aed5f0067308dccb164a
      40d19bc1
  9. 14 6月, 2022 2 次提交
    • G
      Make the per-shard hash table fixed-size. (#10154) · f105e1a5
      Guido Tagliavini Ponce 提交于
      Summary:
      We make the size of the per-shard hash table fixed. The base level of the hash table is now preallocated with the required capacity. The user must provide an estimate of the size of the values.
      
      Notice that even though the base level becomes fixed, the chains are still dynamic. Overall, the shard capacity mechanisms haven't changed, so we don't need to test this.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10154
      
      Test Plan: `make -j24 check`
      
      Reviewed By: pdillinger
      
      Differential Revision: D37124451
      
      Pulled By: guidotag
      
      fbshipit-source-id: cba6ac76052fe0ec60b8ff4211b3de7650e80d0c
      f105e1a5
    • Y
      Fix a race condition in transaction stress test (#10157) · bfaf8291
      Yanqin Jin 提交于
      Summary:
      Before this PR, there can be a race condition between the thread calling
      `StressTest::Open()` and a background compaction thread calling
      `MultiOpsTxnsStressTest::VerifyPkSkFast()`.
      
      ```
      Time   thread1                             bg_compact_thr
       |     TransactionDB::Open(..., &txn_db_)
       |     db_ is still nullptr
       |                                         db_->GetSnapshot()  // segfault
       |     db_ = txn_db_
       V
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10157
      
      Test Plan: CI
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D37121653
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6a53117f958e9ee86f77297fdeb843e5160a9331
      bfaf8291