1. 22 5月, 2021 4 次提交
    • J
      Use large macos instance (#8320) · 6c7c3e8c
      Jay Zhuang 提交于
      Summary:
      Macos build is taking more than 1 hour, bump the instance type from the
      default medium to large (large macos instance was not available before).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8320
      
      Test Plan: watch CI pass
      
      Reviewed By: ajkr
      
      Differential Revision: D28589456
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: cff78dae5aaf9de90ade3468469290176de5ff32
      6c7c3e8c
    • P
      Add table properties for number of entries added to filters (#8323) · 3469d60f
      Peter Dillinger 提交于
      Summary:
      With Ribbon filter work and possible variance in actual bits
      per key (or prefix; general term "entry") to achieve certain FP rates,
      I've received a request to be able to track actual bits per key in
      generated filters. This change adds a num_filter_entries table
      property, which can be combined with filter_size to get bits per key
      (entry).
      
      This can vary from num_entries in at least these ways:
      * Different versions of same key are only counted once in filters.
      * With prefix filters, several user keys map to the same filter entry.
      * A single filter can include both prefixes and user keys.
      
      Note that FilterBlockBuilder::NumAdded() didn't do anything useful
      except distinguish empty from non-empty.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8323
      
      Test Plan: basic unit test included, others updated
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28596210
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 529a111f3c84501e5a470bc84705e436ee68c376
      3469d60f
    • J
      Fix manual compaction `max_compaction_bytes` under-calculated issue (#8269) · 6c865435
      Jay Zhuang 提交于
      Summary:
      Fix a bug that for manual compaction, `max_compaction_bytes` is only
      limit the SST files from input level, but not overlapped files on output
      level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8269
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D28231044
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9d7d03004f30cc4b1b9819830141436907554b7c
      6c865435
    • S
      Try to build with liburing by default. (#8322) · bd3d080e
      sdong 提交于
      Summary:
      By default, try to build with liburing. For make, if ROCKSDB_USE_IO_URING is not set, treat as 1, which means RocksDB will try to build with liburing. For cmake, add WITH_LIBURING to control it, with default on.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8322
      
      Test Plan: Build using cmake and make.
      
      Reviewed By: anand1976
      
      Differential Revision: D28586498
      
      fbshipit-source-id: cfd39159ab697f4b93a9293a59c07f839b1e7ed5
      bd3d080e
  2. 21 5月, 2021 2 次提交
    • S
      Compare memtable insert and flush count (#8288) · 2f1984dd
      sdong 提交于
      Summary:
      When a memtable is flushed, it will validate number of entries it reads, and compare the number with how many entries inserted into memtable. This serves as one sanity c\
      heck against memory corruption. This change will also allow more counters to be added in the future for better validation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8288
      
      Test Plan: Pass all existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D28369194
      
      fbshipit-source-id: 7ff870380c41eab7f99eee508550dcdce32838ad
      2f1984dd
    • J
      Deflake ExternalSSTFileTest.PickedLevelBug (#8307) · 94b4faa0
      Jay Zhuang 提交于
      Summary:
      The test want to make sure these's no compaction during `AddFile`
      (between `DBImpl::AddFile:MutexLock` and `DBImpl::AddFile:MutexUnlock`)
      but the mutex could be unlocked by `EnterUnbatched()`.
      Move the lock start point after bumping the ingest file number.
      
      Also fix the dead lock when ASSERT fails.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8307
      
      Reviewed By: ajkr
      
      Differential Revision: D28479849
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b3c50f66aa5d5f59c5c27f815bfea189c4cd06cb
      94b4faa0
  3. 20 5月, 2021 7 次提交
  4. 19 5月, 2021 3 次提交
    • A
      Sync ingested files only if reopen is supported by the FS (#8296) · 9d61a085
      anand76 提交于
      Summary:
      Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported().
      
      Tests:
      Add a new unit test in external_sst_file_basic_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8296
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28420865
      
      Pulled By: anand1976
      
      fbshipit-source-id: 380e7f5ff95324997f7a59864a9ac96ebbd0100c
      9d61a085
    • S
      Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe() (#8311) · 60e5af83
      sdong 提交于
      Summary:
      Right now return codes by io_uring_submit_and_wait() and io_uring_wait_cqe() are not handled. It is not the good practice. Although these two functions are not supposed to return non-0 values in normal exeuction, people suspect that they might return non-0 value when an interruption happens, and the code might cause hanging.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8311
      
      Test Plan: Make sure at least normal test cases still pass.
      
      Reviewed By: anand1976
      
      Differential Revision: D28500828
      
      fbshipit-source-id: 8a76cea9cafbd041102e0b6a8eef9d0bfed7c211
      60e5af83
    • M
      Fix MultiGet with PinnableSlices and Merge for WBWI (#8299) · 6b0a22a4
      mrambacher 提交于
      Summary:
      The MultiGetFromBatchAndDB would fail if the PinnableSlice value being returned was pinned.  This could happen if the value was retrieved from the DB (not memtable) or potentially if the values were reused (and a previous iteration returned a slice that was pinned).
      
      This change resets the pinnable value to clear it prior to attempting to use it, thereby eliminating the problem with the value already being pinned.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8299
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28455426
      
      Pulled By: mrambacher
      
      fbshipit-source-id: a34d7d983ec9b6bb4c8a2b4892f72858d43e6972
      6b0a22a4
  5. 18 5月, 2021 3 次提交
    • S
      Expose CompressionOptions::parallel_threads through C API (#8302) · 83d1a665
      Stanislav Tkach 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8302
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28499262
      
      Pulled By: ajkr
      
      fbshipit-source-id: 7b17b79af871d874dfca76db9bca0d640a6cd854
      83d1a665
    • L
      Make it possible to apply only a subrange of table property collectors (#8298) · d83542ca
      Levi Tamasi 提交于
      Summary:
      This patch does two things:
      1) Introduces some aliases in order to eliminate/prevent long-winded type names
      w/r/t the internal table property collectors (see e.g.
      `std::vector<std::unique_ptr<IntTblPropCollectorFactory>>`).
      2) Makes it possible to apply only a subrange of table property collectors during
      table building by turning `TableBuilderOptions::int_tbl_prop_collector_factories`
      from a pointer to a `vector` into a range (i.e. a pair of iterators).
      
      Rationale: I plan to introduce a BlobDB related table property collector, which
      should only be applied during table creation if blob storage is enabled at the moment
      (which can be changed dynamically). This change will make it possible to include/
      exclude the BlobDB related collector as needed without having to introduce
      a second `vector` of collectors in `ColumnFamilyData` with pretty much the same
      contents.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8298
      
      Test Plan: `make check`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28430910
      
      Pulled By: ltamasi
      
      fbshipit-source-id: a81d28f2c59495865300f43deb2257d2e6977c8e
      d83542ca
    • S
      Write file temperature information to manifest (#8284) · 0ed8cb66
      sdong 提交于
      Summary:
      As a part of tiered storage, writing tempeature information to manifest is needed so that after DB recovery, RocksDB still has the tiering information, to implement some further necessary functionalities.
      
      Also fix some issues in simulated hybrid FS.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8284
      
      Test Plan: Add a new unit test to validate that the information is indeed written and read back.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28335801
      
      fbshipit-source-id: 56aeb2e6ea090be0200181dd968c8a7278037def
      0ed8cb66
  6. 14 5月, 2021 2 次提交
    • A
      Initial support for secondary cache in LRUCache (#8271) · feb06e83
      anand76 提交于
      Summary:
      Defined the abstract interface for a secondary cache in include/rocksdb/secondary_cache.h, and updated LRUCacheOptions to take a std::shared_ptr<SecondaryCache>. An item is initially inserted into the LRU (primary) cache. When it ages out and evicted from memory, its inserted into the secondary cache. On a LRU cache miss and successful lookup in the secondary cache, the item is promoted to the LRU cache. Only support synchronous lookup currently. The secondary cache would be used to implement a persistent (flash cache) or compressed cache.
      
      Tests:
      Results from cache_bench and db_bench don't show any regression due to these changes.
      
      cache_bench results before and after this change -
      Command
      ```./cache_bench -ops_per_thread=10000000 -threads=1```
      Before
      ```Complete in 40.688 s; QPS = 245774```
      ```Complete in 40.486 s; QPS = 246996```
      ```Complete in 42.019 s; QPS = 237989```
      After
      ```Complete in 40.672 s; QPS = 245869```
      ```Complete in 44.622 s; QPS = 224107```
      ```Complete in 42.445 s; QPS = 235599```
      
      db_bench results before this change, and with this change + https://github.com/facebook/rocksdb/issues/8213 and https://github.com/facebook/rocksdb/issues/8191 -
      Commands
      ```./db_bench  --benchmarks="fillseq,compact" -num=30000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/home/anand76/nvm_cache/db -partition_index_and_filters=true```
      
      ```./db_bench -db=/home/anand76/nvm_cache/db -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=6 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -threads=16 -duration=300```
      Before
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      80.702 micros/op 198104 ops/sec;   54.4 MB/s (3708999 of 3708999 found)
      ```
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      87.124 micros/op 183625 ops/sec;   50.4 MB/s (3439999 of 3439999 found)
      ```
      After
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      77.653 micros/op 206025 ops/sec;   56.6 MB/s (3866999 of 3866999 found)
      ```
      ```
      DB path: [/home/anand76/nvm_cache/db]
      readrandom   :      84.962 micros/op 188299 ops/sec;   51.7 MB/s (3535999 of 3535999 found)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8271
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28357511
      
      Pulled By: anand1976
      
      fbshipit-source-id: d1cfa236f00e649a18c53328be10a8062a4b6da2
      feb06e83
    • J
      Refactor Option obj address from char* to void* (#8295) · d15fbae4
      Jay Zhuang 提交于
      Summary:
      And replace `reinterpret_cast` with `static_cast` or no cast.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8295
      
      Test Plan: `make check`
      
      Reviewed By: mrambacher
      
      Differential Revision: D28420303
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 645be123a0df624dc2bea37cd54a35403fc494fa
      d15fbae4
  7. 13 5月, 2021 4 次提交
  8. 12 5月, 2021 2 次提交
    • P
      New Cache API for gathering statistics (#8225) · 78a309bf
      Peter Dillinger 提交于
      Summary:
      Adds a new Cache::ApplyToAllEntries API that we expect to use
      (in follow-up PRs) for efficiently gathering block cache statistics.
      Notable features vs. old ApplyToAllCacheEntries:
      
      * Includes key and deleter (in addition to value and charge). We could
      have passed in a Handle but then more virtual function calls would be
      needed to get the "fields" of each entry. We expect to use the 'deleter'
      to identify the origin of entries, perhaps even more.
      * Heavily tuned to minimize latency impact on operating cache. It
      does this by iterating over small sections of each cache shard while
      cycling through the shards.
      * Supports tuning roughly how many entries to operate on for each
      lock acquire and release, to control the impact on the latency of other
      operations without excessive lock acquire & release. The right balance
      can depend on the cost of the callback. Good default seems to be
      around 256.
      * There should be no need to disable thread safety. (I would expect
      uncontended locks to be sufficiently fast.)
      
      I have enhanced cache_bench to validate this approach:
      
      * Reports a histogram of ns per operation, so we can look at the
      ditribution of times, not just throughput (average).
      * Can add a thread for simulated "gather stats" which calls
      ApplyToAllEntries at a specified interval. We also generate a histogram
      of time to run ApplyToAllEntries.
      
      To make the iteration over some entries of each shard work as cleanly as
      possible, even with resize between next set of entries, I have
      re-arranged which hash bits are used for sharding and which for indexing
      within a shard.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8225
      
      Test Plan:
      A couple of unit tests are added, but primary validation is manual, as
      the primary risk is to performance.
      
      The primary validation is using cache_bench to ensure that neither
      the minor hashing changes nor the simulated stats gathering
      significantly impact QPS or latency distribution. Note that adding op
      latency histogram seriously impacts the benchmark QPS, so for a
      fair baseline, we need the cache_bench changes (except remove simulated
      stat gathering to make it compile). In short, we don't see any
      reproducible difference in ops/sec or op latency unless we are gathering
      stats nearly continuously. Test uses 10GB block cache with
      8KB values to be somewhat realistic in the number of items to iterate
      over.
      
      Baseline typical output:
      
      ```
      Complete in 92.017 s; Rough parallel ops/sec = 869401
      Thread ops/sec = 54662
      
      Operation latency (ns):
      Count: 80000000 Average: 11223.9494  StdDev: 29.61
      Min: 0  Median: 7759.3973  Max: 9620500
      Percentiles: P50: 7759.40 P75: 14190.73 P99: 46922.75 P99.9: 77509.84 P99.99: 217030.58
      ------------------------------------------------------
      [       0,       1 ]       68   0.000%   0.000%
      (    2900,    4400 ]       89   0.000%   0.000%
      (    4400,    6600 ] 33630240  42.038%  42.038% ########
      (    6600,    9900 ] 18129842  22.662%  64.700% #####
      (    9900,   14000 ]  7877533   9.847%  74.547% ##
      (   14000,   22000 ] 15193238  18.992%  93.539% ####
      (   22000,   33000 ]  3037061   3.796%  97.335% #
      (   33000,   50000 ]  1626316   2.033%  99.368%
      (   50000,   75000 ]   421532   0.527%  99.895%
      (   75000,  110000 ]    56910   0.071%  99.966%
      (  110000,  170000 ]    16134   0.020%  99.986%
      (  170000,  250000 ]     5166   0.006%  99.993%
      (  250000,  380000 ]     3017   0.004%  99.996%
      (  380000,  570000 ]     1337   0.002%  99.998%
      (  570000,  860000 ]      805   0.001%  99.999%
      (  860000, 1200000 ]      319   0.000% 100.000%
      ( 1200000, 1900000 ]      231   0.000% 100.000%
      ( 1900000, 2900000 ]      100   0.000% 100.000%
      ( 2900000, 4300000 ]       39   0.000% 100.000%
      ( 4300000, 6500000 ]       16   0.000% 100.000%
      ( 6500000, 9800000 ]        7   0.000% 100.000%
      ```
      
      New, gather_stats=false. Median thread ops/sec of 5 runs:
      
      ```
      Complete in 92.030 s; Rough parallel ops/sec = 869285
      Thread ops/sec = 54458
      
      Operation latency (ns):
      Count: 80000000 Average: 11298.1027  StdDev: 42.18
      Min: 0  Median: 7722.0822  Max: 6398720
      Percentiles: P50: 7722.08 P75: 14294.68 P99: 47522.95 P99.9: 85292.16 P99.99: 228077.78
      ------------------------------------------------------
      [       0,       1 ]      109   0.000%   0.000%
      (    2900,    4400 ]      793   0.001%   0.001%
      (    4400,    6600 ] 34054563  42.568%  42.569% #########
      (    6600,    9900 ] 17482646  21.853%  64.423% ####
      (    9900,   14000 ]  7908180   9.885%  74.308% ##
      (   14000,   22000 ] 15032072  18.790%  93.098% ####
      (   22000,   33000 ]  3237834   4.047%  97.145% #
      (   33000,   50000 ]  1736882   2.171%  99.316%
      (   50000,   75000 ]   446851   0.559%  99.875%
      (   75000,  110000 ]    68251   0.085%  99.960%
      (  110000,  170000 ]    18592   0.023%  99.983%
      (  170000,  250000 ]     7200   0.009%  99.992%
      (  250000,  380000 ]     3334   0.004%  99.997%
      (  380000,  570000 ]     1393   0.002%  99.998%
      (  570000,  860000 ]      700   0.001%  99.999%
      (  860000, 1200000 ]      293   0.000% 100.000%
      ( 1200000, 1900000 ]      196   0.000% 100.000%
      ( 1900000, 2900000 ]       69   0.000% 100.000%
      ( 2900000, 4300000 ]       32   0.000% 100.000%
      ( 4300000, 6500000 ]       10   0.000% 100.000%
      ```
      
      New, gather_stats=true, 1 second delay between scans. Scans take about
      1 second here so it's spending about 50% time scanning. Still the effect on
      ops/sec and latency seems to be in the noise. Median thread ops/sec of 5 runs:
      
      ```
      Complete in 91.890 s; Rough parallel ops/sec = 870608
      Thread ops/sec = 54551
      
      Operation latency (ns):
      Count: 80000000 Average: 11311.2629  StdDev: 45.28
      Min: 0  Median: 7686.5458  Max: 10018340
      Percentiles: P50: 7686.55 P75: 14481.95 P99: 47232.60 P99.9: 79230.18 P99.99: 232998.86
      ------------------------------------------------------
      [       0,       1 ]       71   0.000%   0.000%
      (    2900,    4400 ]      291   0.000%   0.000%
      (    4400,    6600 ] 34492060  43.115%  43.116% #########
      (    6600,    9900 ] 16727328  20.909%  64.025% ####
      (    9900,   14000 ]  7845828   9.807%  73.832% ##
      (   14000,   22000 ] 15510654  19.388%  93.220% ####
      (   22000,   33000 ]  3216533   4.021%  97.241% #
      (   33000,   50000 ]  1680859   2.101%  99.342%
      (   50000,   75000 ]   439059   0.549%  99.891%
      (   75000,  110000 ]    60540   0.076%  99.967%
      (  110000,  170000 ]    14649   0.018%  99.985%
      (  170000,  250000 ]     5242   0.007%  99.991%
      (  250000,  380000 ]     3260   0.004%  99.995%
      (  380000,  570000 ]     1599   0.002%  99.997%
      (  570000,  860000 ]     1043   0.001%  99.999%
      (  860000, 1200000 ]      471   0.001%  99.999%
      ( 1200000, 1900000 ]      275   0.000% 100.000%
      ( 1900000, 2900000 ]      143   0.000% 100.000%
      ( 2900000, 4300000 ]       60   0.000% 100.000%
      ( 4300000, 6500000 ]       27   0.000% 100.000%
      ( 6500000, 9800000 ]        7   0.000% 100.000%
      ( 9800000, 14000000 ]        1   0.000% 100.000%
      
      Gather stats latency (us):
      Count: 46 Average: 980387.5870  StdDev: 60911.18
      Min: 879155  Median: 1033777.7778  Max: 1261431
      Percentiles: P50: 1033777.78 P75: 1120666.67 P99: 1261431.00 P99.9: 1261431.00 P99.99: 1261431.00
      ------------------------------------------------------
      (  860000, 1200000 ]       45  97.826%  97.826% ####################
      ( 1200000, 1900000 ]        1   2.174% 100.000%
      
      Most recent cache entry stats:
      Number of entries: 1295133
      Total charge: 9.88 GB
      Average key size: 23.4982
      Average charge: 8.00 KB
      Unique deleters: 3
      ```
      
      Reviewed By: mrambacher
      
      Differential Revision: D28295742
      
      Pulled By: pdillinger
      
      fbshipit-source-id: bbc4a552f91ba0fe10e5cc025c42cef5a81f2b95
      78a309bf
    • M
      Added static methods for simple types to OptionTypeInfo (#8249) · 78e82410
      mrambacher 提交于
      Summary:
      Added ParseType, SerializeType, and TypesAreEqual methods to OptionTypeInfo.  These methods can be used for serialization and deserialization of basic types.
      
      Change the MutableCF/DB Options to use this format.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8249
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28351190
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 72a78643b804f2f0bf59c32ffefa63346672ad16
      78e82410
  9. 11 5月, 2021 2 次提交
    • M
      Add ObjectRegistry to ConfigOptions (#8166) · 9f2d255a
      mrambacher 提交于
      Summary:
      This change enables a couple of things:
      - Different ConfigOptions can have different registry/factory associated with it, thereby allowing things like a "Test" ConfigOptions versus a "Production"
      - The ObjectRegistry is created fewer times and can be re-used
      
      The ConfigOptions can also be initialized/constructed from a DBOptions, in which case it will grab some of its settings (Env, Logger) from the DBOptions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8166
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D27657952
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ae1d6200bb7ab127405cdeefaba43c7fe694dfdd
      9f2d255a
    • M
      Add Merge Operator support to WriteBatchWithIndex (#8135) · ff463742
      mrambacher 提交于
      Summary:
      The WBWI has two differing modes of operation dependent on the value
      of the constructor parameter `overwrite_key`.
      Currently, regardless of the parameter, neither mode performs as
      expected when using Merge. This PR remedies this by correctly invoking
      the appropriate Merge Operator before returning results from the WBWI.
      
      Examples of issues that exist which are solved by this PR:
      
      ## Example 1 with `overwrite_key=false`
      Currently, from an empty database, the following sequence:
      ```
      Put('k1', 'v1')
      Merge('k1', 'v2')
      Get('k1')
      ```
      Incorrectly yields `v2`, that is to say that the Merge behaves like a Put.
      
      ## Example 2 with o`verwrite_key=true`
      Currently, from an empty database, the following sequence:
      ```
      Put('k1', 'v1')
      Merge('k1', 'v2')
      Get('k1')
      ```
      Incorrectly yields `ERROR: kMergeInProgress`.
      
      ## Example 3 with `overwrite_key=false`
      Currently, with a database containing `('k1' -> 'v1')`, the following sequence:
      ```
      Merge('k1', 'v2')
      GetFromBatchAndDB('k1')
      ```
      Incorrectly yields `v1,v2`
      
      ## Example 4 with `overwrite_key=true`
      Currently, with a database containing `('k1' -> 'v1')`, the following sequence:
      ```
      Merge('k1', 'v1')
      GetFromBatchAndDB('k1')
      ```
      Incorrectly yields `ERROR: kMergeInProgress`.
      
      ## Example 5 with `overwrite_key=false`
      Currently, from an empty database, the following sequence:
      ```
      Put('k1', 'v1')
      Merge('k1', 'v2')
      GetFromBatchAndDB('k1')
      ```
      Incorrectly yields `v1,v2`
      
      ## Example 6 with `overwrite_key=true`
      Currently, from an empty database, `('k1' -> 'v1')`, the following sequence:
      ```
      Put('k1', 'v1')
      Merge('k1', 'v2')
      GetFromBatchAndDB('k1')
      ```
      Incorrectly yields `ERROR: kMergeInProgress`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8135
      
      Reviewed By: pdillinger
      
      Differential Revision: D27657938
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 0fbda6bbc66bedeba96a84786d90141d776297df
      ff463742
  10. 08 5月, 2021 5 次提交
  11. 07 5月, 2021 1 次提交
  12. 06 5月, 2021 5 次提交
    • A
      Permit stdout "fail"/"error" in whitebox crash test (#8272) · b71b4597
      Andrew Kryczka 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/8268, the `db_stress` stdout began containing both the strings
      "fail" and "error" (case-insensitive). The whitebox crash test
      failed upon seeing either of those strings.
      
      I checked that all other occurrences of "fail" and "error"
      (case-insensitive) that `db_stress` produces are printed to `stderr`. So
      this PR separates the handling of `db_stress`'s stdout and stderr, and
      only fails when one those bad strings are found in stderr.
      
      The downside of this PR is `db_stress`'s original interleaving of stdout/stderr is not preserved in `db_crashtest.py`'s output.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8272
      
      Test Plan:
      run it; see it succeeds for several runs until encountering a real error
      
      ```
      $ python3 tools/db_crashtest.py whitebox --simple --random_kill_odd=8887 --max_key=1000000 --value_size_mult=33
      ...
      db_stress: cache/clock_cache.cc:483: bool rocksdb::{anonymous}::ClockCacheShard::Unref(rocksdb::{anonymous}::CacheHandle*, bool, rocksdb::{anonymous}::CleanupContext*): Assertion `CountRefs(flags) > 0' failed.
      
      TEST FAILED. Output has 'fail'!!!
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28239233
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3b8602a0d570466a7e2c81bb9c49468f7716091e
      b71b4597
    • S
      db_stress: wait for compaction to finish after open with failure injection (#8270) · 7f3a0f5b
      sdong 提交于
      Summary:
      When injecting in DB open, error can happen in background threads, causing DB open succeed, but DB is soon made read-only and subsequence writes will fail, which is not expected. To prevent it from happening, wait for compaction to finish before serving the traffic. If there is a failure, reopen.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8270
      
      Test Plan: Run the test.
      
      Reviewed By: ajkr
      
      Differential Revision: D28230537
      
      fbshipit-source-id: e2e97888904f9b9bb50c35ccf95b88c2319ef5c3
      7f3a0f5b
    • S
      Refactor kill point (#8241) · e19908cb
      sdong 提交于
      Summary:
      Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241
      
      Test Plan: make release and run crash test for a while.
      
      Reviewed By: anand1976
      
      Differential Revision: D28078486
      
      fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f
      e19908cb
    • M
      Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) · 8948dc85
      mrambacher 提交于
      Summary:
      The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions.  This change cleans that up by introducing an ImmutableOptions struct.  Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form).
      
      Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR.  All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes.
      
      Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262
      
      Reviewed By: pdillinger
      
      Differential Revision: D28226540
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf
      8948dc85
    • A
      Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268) · 0f42e50f
      Andrew Kryczka 提交于
      Summary:
      See release note in HISTORY.md.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268
      
      Test Plan: unit test repro
      
      Reviewed By: siying
      
      Differential Revision: D28227901
      
      Pulled By: ajkr
      
      fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339
      0f42e50f