1. 24 8月, 2021 3 次提交
  2. 23 8月, 2021 1 次提交
  3. 21 8月, 2021 8 次提交
    • L
      Update version.h and HISTORY.md for the 6.24 release (#8688) · 8c9e6897
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8688
      
      Reviewed By: ajkr, riversand963
      
      Differential Revision: D30467746
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 0fce0d42fe2fe3cb56d7a89607154b3b957f09b6
      8c9e6897
    • P
      Embed original file number in SST table properties (#8686) · 04db7648
      Peter Dillinger 提交于
      Summary:
      I very recently realized that with https://github.com/facebook/rocksdb/issues/8669 we cannot later add
      file numbers to external SST files (so that more can share db session
      ids for better uniqueness properties), because of forward compatibility.
      We would have a version of RocksDB that assumes session IDs are unique
      on external SST files and therefore can't really break that invariant in
      future files.
      
      This change adds a table property for "orig_file_number" which is
      populated by normal SST files and also external SST files generated by
      SstFileWriter. SstFileWriter now keeps a db_session_id for life of the
      object and increments its own file numbers for embedding in table
      properties. (They are arguably "fake" file numbers because these numbers
      and not embedded in the file name.)
      
      While updating block_based_table_builder, I removed several unnecessary
      fields from Rep, because following the pattern would have created
      another unnecessary field.
      
      This change also updates block_based_table_reader to use this new
      property when available, which means that for newer SST files, we can
      determine the stable/original <db_session_id,file_number> unique
      identifier using just the file contents, not the file name. (It's a bit
      complicated; detailed comments in block_based_table_reader.)
      
      Also added DB host id to properties listing by sst_dump, which could be
      useful in debugging.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8686
      
      Test Plan: majorly overhauled StableCacheKeys test for this change
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30457742
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2e5ae7dddeb94fb9d8eac8a928486aed8b8cd445
      04db7648
    • P
      Upgrade xxhash, add Hash128 (#8634) · 22161b75
      Peter Dillinger 提交于
      Summary:
      With expected use for a 128-bit hash, xxhash library is
      upgraded to current dev (2c611a76f914828bed675f0f342d6c4199ffee1e)
      as of Aug 6 so that we can use production version of XXH3_128bits
      as new Hash128 function (added in hash128.h).
      
      To make this work, however, we have to carve out the "preview" version
      of XXH3 that is used in new SST Bloom and Ribbon filters, since that
      will not get maintenance in xxhash releases. I have consolidated all the
      relevant code into xxph3.h and made it "inline only" (no .cc file). The
      working name for this hash function is changed from XXH3p to XXPH3
      (XX Preview Hash) because the latter is easier to get working with no
      symbol name conflicts between the headers.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8634
      
      Test Plan:
      no expected change in existing functionality. For Hash128,
      added some unit tests based on those for Hash64 to ensure some basic
      properties and that the values do not change accidentally.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30173490
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 06aa542a7a28b353bc2c865b9b2f8bdfe44158e4
      22161b75
    • P
      Add Bloom/Ribbon hybrid API support (#8679) · 2a383f21
      Peter Dillinger 提交于
      Summary:
      This is essentially resurrection and fixing of the part of
      https://github.com/facebook/rocksdb/issues/8198 that was reverted in https://github.com/facebook/rocksdb/issues/8212, using data added in https://github.com/facebook/rocksdb/issues/8246. Basically,
      when configuring Ribbon filter, you can specify an LSM level before which
      Bloom will be used instead of Ribbon. But Bloom is only considered for
      Leveled and Universal compaction styles and file going into a known LSM
      level. This way, SST file writer, FIFO compaction, etc. use Ribbon filter as
      you would expect with NewRibbonFilterPolicy.
      
      So that this can be controlled with a single int value and so that flushes
      can be distinguished from intra-L0, we consider flush to go to level -1 for
      the purposes of this option. (Explained in API comment.)
      
      I also expect the most common and recommended Ribbon configuration to
      use Bloom during flush, to minimize slowing down writes and because according
      to my estimates, Ribbon only pays off if the structure lives in memory for
      more than an hour. Thus, I have changed the default for NewRibbonFilterPolicy
      to be this mild hybrid configuration. I don't really want to add something like
      NewHybridFilterPolicy because at least the mild hybrid configuration (Bloom for
      flush, Ribbon otherwise) should be considered a natural choice.
      
      C APIs also updated, but because they don't support overloading,
      rocksdb_filterpolicy_create_ribbon is kept pure ribbon for clarity and
      rocksdb_filterpolicy_create_ribbon_hybrid must be called for a hybrid
      configuration. While touching C API, I changed bits per key options from
      int to double.
      
      BuiltinFilterPolicy is needed so that LevelThresholdFilterPolicy doesn't inherit
      unused fields from BloomFilterPolicy.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8679
      
      Test Plan: new + updated tests, including crash test
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D30445797
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6f5aeddfd6d79f7e55493b563c2d1d2d568892e1
      2a383f21
    • M
      Add `IteratorTraceExecutionResult` for iterator related trace records. (#8687) · baf22b4e
      Merlin Mao 提交于
      Summary:
      - Allow to get `Valid()`, `status()`, `key()` and `value()` of an iterator from `IteratorTraceExecutionResult`.
      - Move lower bound and upper bound from `IteratorSeekQueryTraceRecord` to `IteratorQueryTraceRecord`.
      
      Added test in `DBTest2.TraceAndReplay`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8687
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30457630
      
      Pulled By: autopear
      
      fbshipit-source-id: be433099a25895b3aa6f0c00f95ad7b1d7489c1d
      baf22b4e
    • A
      Add a PerfContext counter for secondary cache hits (#8685) · f35042ca
      anand76 提交于
      Summary:
      Add a PerfContext counter.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8685
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30453957
      
      Pulled By: anand1976
      
      fbshipit-source-id: 42888a3ced240e1c44446d52d3b04adfb01f5665
      f35042ca
    • A
      Update the block_read_count/block_read_byte counters in MultiGet (#8676) · 22f2936b
      anand76 提交于
      Summary:
      MultiGet in block based table reader doesn't use BlockFetcher. As a result, the block_read_count and block_read_byte PerfContext counters were not being updated. This fixes that by updating them in MultiRead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8676
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30428680
      
      Pulled By: anand1976
      
      fbshipit-source-id: 21846efe92588fc17123665dd06733693a40126d
      22f2936b
    • A
      Fix blob callback in compaction and atomic flush (#8681) · 5efec84c
      Akanksha Mahajan 提交于
      Summary:
      Pass BlobFileCompletionCallback  in case of atomic flush and
      compaction job which is currently nullptr(default parameter).
      BlobFileCompletionCallback is used in case of IntegratedBlobDB to report new blob files to
      SstFileManager.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8681
      
      Test Plan: CircleCI jobs
      
      Reviewed By: ltamasi
      
      Differential Revision: D30445998
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: ba48093843864faec57f1f365cce7b5a569c4021
      5efec84c
  4. 20 8月, 2021 2 次提交
    • M
      Add iterator's lower and upper bounds to `TraceRecord` (#8677) · ff895338
      Merlin Mao 提交于
      Summary:
      Trace file V2 added lower/upper bounds to `Iterator::Seek()` and `Iterator::SeekForPrev()`. They were not used anywhere during the execution of a `TraceRecord`. Now they are added to be used by `ReadOptions` during `Iterator::Seek()` and `Iterator::SeekForPrev()` if they are set.
      
      Added test cases in `DBTest2.TraceAndManualReplay`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8677
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30438255
      
      Pulled By: autopear
      
      fbshipit-source-id: 82563006be0b69155990e506a74951c18af8d288
      ff895338
    • M
      Fix some minor issues in the Customizable infrastructure (#8566) · 9eb002fc
      mrambacher 提交于
      Summary:
      - Fix issue with OptionType::Vector when the nested item is a Customizable with no names
      - Fix issue with OptionType::Vector to appropriately wrap the elements in a Vector;
      - Fix an issue with nested Customizable object with a null immutable object still appearing in the mutable options;
      - Fix/Add tests for null/empty customizable objects
      - Move the RegisterTestObjects from customizable_test into testutil.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8566
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30303724
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 33fa8ea2a3b663210cb356da05e64aab7585b1b5
      9eb002fc
  5. 19 8月, 2021 3 次提交
    • B
      Add condition on NotifyOnFlushComplete that FlushJob was not mempurge. Add... · c625b8d0
      Baptiste Lemaire 提交于
      Add condition on NotifyOnFlushComplete that FlushJob was not mempurge. Add event listeners to mempurge tests. (#8672)
      
      Summary:
      Previously, when a `FlushJob` was redirected to a MemPurge, the function `DBImpl::NotifyOnFlushComplete` was called, which created a series of issues because the JobInfo was not correctly collected from the memtables.
      This diff aims at correcting these two issues (`FlushJobInfo` collection in `FlushJob::MemPurge` , no call to `DBImpl::NotifyOnFlushComplete` after successful mempurge).
      Event listeners were added to the unit tests to handle these situations.
      Surprisingly none of the crashtests caught this issue, I will try to add event listeners to crash tests in the future.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8672
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D30383109
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 35a8d4295886923ee4049a6447f00022cb221c73
      c625b8d0
    • M
      Allow Replayer to report the results of TraceRecords. (#8657) · d10801e9
      Merlin Mao 提交于
      Summary:
      `Replayer::Execute()` can directly returns the result (e.g, request latency, DB::Get() return code, returned value, etc.)
      `Replayer::Replay()` reports the results via a callback function.
      
      New interface:
      `TraceRecordResult` in "rocksdb/trace_record_result.h".
      
      `DBTest2.TraceAndReplay` and `DBTest2.TraceAndManualReplay` are updated accordingly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8657
      
      Reviewed By: ajkr
      
      Differential Revision: D30290216
      
      Pulled By: autopear
      
      fbshipit-source-id: 3c8d4e6b180ec743de1a9d9dcaee86064c74f0d6
      d10801e9
    • P
      Stable cache keys on ingested SST files (#8669) · b6269b07
      Peter Dillinger 提交于
      Summary:
      Extends https://github.com/facebook/rocksdb/issues/8659 to work for ingested external SST files, even
      the same file ingested into different DBs sharing a block cache.
      
      Note: These new cache keys are currently only enabled when FileSystem
      does not provide GetUniqueId. For now, they are typically larger,
      so slightly less efficient.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8669
      
      Test Plan: Extended unit test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30398532
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1f13e2af4b8bfff5741953a69466e9589fbc23c7
      b6269b07
  6. 18 8月, 2021 3 次提交
    • Y
      Fix bug caused by releasing snapshot(s) during compaction (#8608) · 2b367fa8
      Yanqin Jin 提交于
      Summary:
      In debug mode, we are seeing assertion failure as follows
      
      ```
      db/compaction/compaction_iterator.cc:980: void rocksdb::CompactionIterator::PrepareOutput(): \
      Assertion `ikey_.type != kTypeDeletion && ikey_.type != kTypeSingleDeletion' failed.
      ```
      
      It is caused by releasing earliest snapshot during compaction between the execution of
      `NextFromInput()` and `PrepareOutput()`.
      
      In one case, as demonstrated in unit test `WritePreparedTransaction.ReleaseEarliestSnapshotDuringCompaction_WithSD2`,
      incorrect result may be returned by a following range scan if we disable assertion, as in opt compilation
      level: the SingleDelete marker's sequence number is zeroed out, but the preceding PUT is also
      outputted to the SST file after compaction. Due to the logic of DBIter, the PUT will not be
      skipped and will be returned by iterator in range scan. https://github.com/facebook/rocksdb/issues/8661 illustrates what happened.
      
      Fix by taking a more conservative approach: make compaction zero out sequence number only
      if key is in the earliest snapshot when the compaction starts.
      
      Another assertion failure is
      ```
      Assertion `current_user_key_snapshot_ == last_snapshot' failed.
      ```
      
      It's caused by releasing the snapshot between the PUT and SingleDelete during compaction.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8608
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D30145645
      
      Pulled By: riversand963
      
      fbshipit-source-id: 699f58e66faf70732ad53810ccef43935d3bbe81
      2b367fa8
    • L
      Add statistics support to integrated BlobDB (#8667) · 6878cedc
      Levi Tamasi 提交于
      Summary:
      The patch adds statistics support to the integrated BlobDB implementation,
      namely the tickers `BLOB_DB_BLOB_FILE_BYTES_READ` and
      `BLOB_DB_GC_{NUM_KEYS,BYTES}_RELOCATED`, and the histograms
      `BLOB_DB_(DE)COMPRESSION_MICROS`. (Some other statistics, like
      `BLOB_DB_BLOB_FILE_BYTES_WRITTEN`, `BLOB_DB_BLOB_FILE_SYNCED`,
      `BLOB_DB_BLOB_FILE_{READ,WRITE,SYNC}_MICROS` were already supported.)
      Note that the vast majority of the old BlobDB's tickers/histograms are not
      really applicable to the new implementation, since they e.g. pertain to calling
      dedicated BlobDB APIs (which the integrated BlobDB does not have) or are
      tied to the legacy BlobDB's design of writing blob files synchronously when
      a write API is called. Such statistics are marked "legacy BlobDB only" in
      `statistics.h`.
      
      Fixes https://github.com/facebook/rocksdb/issues/8645 .
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8667
      
      Test Plan: Ran `make check` and tested the new statistics using `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D30356884
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 5f8a833faee60401c5643c2f0a6c0415488190a4
      6878cedc
    • J
      Exclude property kLiveSstFilesSizeAtTemperature from stress_test (#8668) · 0729b287
      Jay Zhuang 提交于
      Summary:
      Just like other per_level properties.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8668
      
      Test Plan: stress_test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30360967
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 70da2557b95c55e8081b04ebf1a909a0fe69488f
      0729b287
  7. 17 8月, 2021 2 次提交
    • A
      Add a stat to count secondary cache hits (#8666) · add68bd2
      anand76 提交于
      Summary:
      Add a stat for secondary cache hits. The ```Cache::Lookup``` API had an unused ```stats``` parameter. This PR uses that to pass the pointer to a ```Statistics``` object that ```LRUCache``` uses to record the stat.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8666
      
      Test Plan: Update a unit test in lru_cache_test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30353816
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2046f78b460428877a26ffdd2bb914ae47dfbe77
      add68bd2
    • P
      Stable cache keys using DB session ids in SSTs (#8659) · a207c278
      Peter Dillinger 提交于
      Summary:
      Use DB session ids in SST table properties to make cache keys
      stable across DB re-open and copy / move / restore / etc.
      
      These new cache keys are currently only enabled when FileSystem does not
      provide GetUniqueId. For now, they are typically larger, so slightly
      less efficient.
      
      Relevant to https://github.com/facebook/rocksdb/issues/7405
      
      This change has a minor regression in PersistentCache functionality:
      metaindex blocks are no longer cached in PersistentCache. Table properties
      blocks already were not but ideally should be. I didn't spent effort to
      fix & test these issues because we don't believe PersistentCache is used much
      if at all and expect SecondaryCache to replace it. (Though PRs are welcome.)
      
      FIXME: there is more to be fixed for stable cache keys on external SST files
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8659
      
      Test Plan:
      new unit test added, which fails when disabling new
      functionality
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30297705
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e8539a5c8802a79340405629870f2e3fb3822d3a
      a207c278
  8. 16 8月, 2021 4 次提交
  9. 14 8月, 2021 1 次提交
    • B
      Improve MemPurge sampling (#8656) · e51be2c5
      Baptiste Lemaire 提交于
      Summary:
      Previously, the `MemPurge` sampling function was assessing whether a random entry from a memtable was garbage or not by simply querying the given memtable (see https://github.com/facebook/rocksdb/issues/8628 for more details).
      In this diff, I am updating the sampling function by querying not only the memtable the entry was drawn from, but also all subsequent memtables that have a greater memtable ID.
      I also added the size of the value for KV entries in the payload/useful payload estimates (which was also one of the reasons why sampling was not as good as mempurging all the time in terms of L0 SST files reduction).
      Once these changes were made, I was able to clean obsolete objects and functions from the `MemtableList` struct, and did a bit of cleanup everywhere.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8656
      
      Reviewed By: pdillinger
      
      Differential Revision: D30288583
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 7646a545ec56f4715949daa59ab5eee74540feb3
      e51be2c5
  10. 13 8月, 2021 1 次提交
    • M
      Code cleanup for trace replayer (#8652) · 74a652a4
      Merlin Mao 提交于
      Summary:
      - Remove extra `;` in trace_record.h
      - Remove some unnecessary `assert` in trace_record_handler.cc
      - Initialize `env_` after` exec_handler_` in `ReplayerImpl` to let db be asserted in creating the handler before getting `db->GetEnv()`.
      - Update history to include the new `TraceReader::Reset()`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8652
      
      Reviewed By: ajkr
      
      Differential Revision: D30276872
      
      Pulled By: autopear
      
      fbshipit-source-id: 476ee162e0f241490c6209307448343a5b326b37
      74a652a4
  11. 12 8月, 2021 4 次提交
    • M
      Make TraceRecord and Replayer public (#8611) · f58d2767
      Merlin Mao 提交于
      Summary:
      New public interfaces:
      `TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h".
      `Replayer`, available in `rocksdb/utilities/replayer.h`.
      
      User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file.
      
      Unit tests:
      - `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes.
      - `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611
      
      Reviewed By: ajkr
      
      Differential Revision: D30266329
      
      Pulled By: autopear
      
      fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8
      f58d2767
    • B
      Re-add retired mempurge flag definitions for legacy-options-file temporary support. (#8650) · a53563d8
      Baptiste Lemaire 提交于
      Summary:
      Current internal regression tests pass in an old option flag `experimental_allow_mempurge` to a more recently built db.
      This flag was retired and removed in a recent PR (https://github.com/facebook/rocksdb/issues/8628), and therefore, the following error comes up : `Failed: Invalid argument: Could not find option: : experimental_allow_mempurge`.
      In this PR, I reintroduce the two flags retired in https://github.com/facebook/rocksdb/issues/8628, `experimental_allow_mempurge` and `experimental_mempurge_policy` in `db_options.cc` and mark them both as `kDeprecated`.
      This is a temporary fix to save us time to find a long term solution, which hopefully will consist in ignoring options prefixed with `experimental_` that are no longer recognized.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8650
      
      Reviewed By: pdillinger
      
      Differential Revision: D30257307
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 35303655fd2dd9789fd9e3c450e9d8009f3c1f54
      a53563d8
    • P
      Update and enhance check_format_compatible.sh (#8651) · 6450e9fc
      Peter Dillinger 提交于
      Summary:
      The last few releases overlooked adding to this test. This
      change fixes that.
      
      This change also fixes the problem of older branches not understanding
      ROCKSDB_NO_FBCODE and referencing compilers no longer supported.
      During the test, build_detect_platform is patched to force no FBCODE
      compiler usage. (We should not need to update old branches perpetually.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8651
      
      Test Plan: local run reproduces regression described in https://github.com/facebook/rocksdb/issues/8650
      
      Reviewed By: jay-zhuang, zhichao-cao
      
      Differential Revision: D30261872
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 02b447d224d7e0eb8613c63185437ded146713bc
      6450e9fc
    • J
      Add suggestion for btrfs user to disable preallocation (#8646) · 87e23587
      Jay Zhuang 提交于
      Summary:
      Add comment for `options.allow_fallocate` that btrfs
      preallocated space are not freed and a suggestion to disable
      preallocation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8646
      
      Test Plan: No code change
      
      Reviewed By: ajkr
      
      Differential Revision: D30240050
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 75b7190bc8276ce8d8ac2d0cb9064b386cbf4768
      87e23587
  12. 11 8月, 2021 2 次提交
    • B
      Memtable sampling for mempurge heuristic. (#8628) · e3a96c48
      Baptiste Lemaire 提交于
      Summary:
      Changes the API of the MemPurge process: the `bool experimental_allow_mempurge` and `experimental_mempurge_policy` flags have been replaced by a `double experimental_mempurge_threshold` option.
      This change of API reflects another major change introduced in this PR: the MemPurgeDecider() function now works by sampling the memtables being flushed to estimate the overall amount of useful payload (payload minus the garbage), and then compare this useful payload estimate with the `double experimental_mempurge_threshold` value.
      Therefore, when the value of this flag is `0.0` (default value), mempurge is simply deactivated. On the other hand, a value of `DBL_MAX` would be equivalent to always going through a mempurge regardless of the garbage ratio estimate.
      At the moment, a `double experimental_mempurge_threshold` value else than 0.0 or `DBL_MAX` is opnly supported`with the `SkipList` memtable representation.
      Regarding the sampling, this PR includes the introduction of a `MemTable::UniqueRandomSample` function that collects (approximately) random entries from the memtable by using the new `SkipList::Iterator::RandomSeek()` under the hood, or by iterating through each memtable entry, depending on the target sample size and the total number of entries.
      The unit tests have been readapted to support this new API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8628
      
      Reviewed By: pdillinger
      
      Differential Revision: D30149315
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 1feef5390c95db6f4480ab4434716533d3947f27
      e3a96c48
    • L
      Attempt to deflake DBTestXactLogIterator.TransactionLogIteratorCorruptedLog (#8627) · f63331eb
      Levi Tamasi 提交于
      Summary:
      The patch attempts to deflake `DBTestXactLogIterator.TransactionLogIteratorCorruptedLog`
      by disabling file deletions while retrieving the list of WAL files and truncating the first WAL file.
      This is to prevent the `PurgeObsoleteFiles` call triggered by `GetSortedWalFiles` from
      invalidating the result of `GetSortedWalFiles`. The patch also cleans up the test case a bit
      and changes it to using `test::TruncateFile` instead of calling the `truncate` syscall directly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8627
      
      Test Plan: `make check`
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D30147002
      
      Pulled By: ltamasi
      
      fbshipit-source-id: db11072a4ad8900a2f859cb5294e22b1888c23f6
      f63331eb
  13. 10 8月, 2021 4 次提交
    • A
      Simplify GenericRateLimiter algorithm (#8602) · 82b81dc8
      Andrew Kryczka 提交于
      Summary:
      `GenericRateLimiter` slow path handles requests that cannot be satisfied
      immediately.  Such requests enter a queue, and their thread stays in `Request()`
      until they are granted or the rate limiter is stopped.  These threads are
      responsible for unblocking themselves.  The work to do so is split into two main
      duties.
      
      (1) Waiting for the next refill time.
      (2) Refilling the bytes and granting requests.
      
      Prior to this PR, the slow path logic involved a leader election algorithm to
      pick one thread to perform (1) followed by (2).  It elected the thread whose
      request was at the front of the highest priority non-empty queue since that
      request was most likely to be granted.  This algorithm was efficient in terms of
      reducing intermediate wakeups, which is a thread waking up only to resume
      waiting after finding its request is not granted.  However, the conceptual
      complexity of this algorithm was too high.  It took me a long time to draw a
      timeline to understand how it works for just one edge case yet there were so
      many.
      
      This PR drops the leader election to reduce conceptual complexity.  Now, the two
      duties can be performed by whichever thread acquires the lock first.  The risk
      of this change is increasing the number of intermediate wakeups, however, we
      took steps to mitigate that.
      
      - `wait_until_refill_pending_` flag ensures only one thread performs (1). This\
      prevents the thundering herd problem at the next refill time. The remaining\
      threads wait on their condition variable with an unbounded duration -- thus we\
      must remember to notify them to ensure forward progress.
      - (1) is typically done by a thread at the front of a queue. This is trivial\
      when the queues are initially empty as the first choice that arrives must be\
      the only entry in its queue. When queues are initially non-empty, we achieve\
      this by having (2) notify a thread at the front of a queue (preferring higher\
      priority) to perform the next duty.
      - We do not require any additional wakeup for (2). Typically it will just be\
      done by the thread that finished (1).
      
      Combined, the second and third bullet points above suggest the refill/granting
      will typically be done by a request at the front of its queue.  This is
      important because one wakeup is saved when a granted request happens to be in an
      already running thread.
      
      Note there are a few cases that still lead to intermediate wakeup, however.  The
      first two are existing issues that also apply to the old algorithm, however, the
      third (including both subpoints) is new.
      
      - No request may be granted (only possible when rate limit dynamically\
      decreases).
      - Requests from a different queue may be granted.
      - (2) may be run by a non-front request thread causing it to not be granted even\
      if some requests in that same queue are granted. It can happen for a couple\
      (unlikely) reasons.
        - A new request may sneak in and grab the lock at the refill time, before the\
      thread finishing (1) can wake up and grab it.
        - A new request may sneak in and grab the lock and execute (1) before (2)'s\
      chosen candidate can wake up and grab the lock. Then that non-front request\
      thread performing (1) can carry over to perform (2).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8602
      
      Test Plan:
      - Use existing tests. The edge cases listed in the comment are all performance\
      related; I could not really think of any related to correctness. The logic\
      looks the same whether a thread wakes up/finishes its work early/on-time/late,\
      or whether the thread is chosen vs. "steals" the work.
      - Verified write throughput and CPU overhead are basically the same with and\
        without this change, even in a rate limiter heavy workload:
      
      Test command:
      ```
      $ rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -benchmarks=fillrandom -num_multi_db=64 -num_low_pri_threads=64 -num_high_pri_threads=64 -write_buffer_size=262144 -target_file_size_base=262144 -max_bytes_for_level_base=1048576 -rate_limiter_bytes_per_sec=16777216 -key_size=24 -value_size=1000 -num=10000 -compression_type=none -rate_limiter_refill_period_us=1000
      ```
      
      Results before this PR:
      
      ```
      fillrandom   :     108.463 micros/op 9219 ops/sec;    9.0 MB/s
      7.40user 8.84system 1:26.20elapsed 18%CPU (0avgtext+0avgdata 256140maxresident)k
      ```
      
      Results after this PR:
      
      ```
      fillrandom   :     108.108 micros/op 9250 ops/sec;    9.0 MB/s
      7.45user 8.23system 1:26.68elapsed 18%CPU (0avgtext+0avgdata 255688maxresident)k
      ```
      
      Reviewed By: hx235
      
      Differential Revision: D30048013
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6741bba9d9dfbccab359806d725105817fef818b
      82b81dc8
    • L
      rocksdb: don't call LZ4_loadDictHC with null dictionary · a756fb9c
      Lucian Grijincu 提交于
      Summary: UBSAN revealed a pointer underflow when `LZ4HC_init_internal` is called with a null `start`.
      
      Reviewed By: ajkr
      
      Differential Revision: D30181874
      
      fbshipit-source-id: ca9bbac1a85c58782871d7f153af733b000cc66c
      a756fb9c
    • J
      Add an unittest for tiered storage universal compaction (#8631) · 61f83dfe
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8631
      
      Reviewed By: siying
      
      Differential Revision: D30200385
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0fa2bb15e74ff81762d767f234078e0fe0106c55
      61f83dfe
    • S
      Move old files to warm tier in FIFO compactions (#8310) · e7c24168
      sdong 提交于
      Summary:
      Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310
      
      Test Plan: Add several unit tests.
      
      Reviewed By: ajkr
      
      Differential Revision: D28493792
      
      fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501
      e7c24168
  14. 08 8月, 2021 1 次提交
    • A
      Fix db_stress failure (#8632) · 052c24a6
      Akanksha Mahajan 提交于
      Summary:
      FaultInjectionTestFS injects error in Rename operation. Because
      of injected error, info.log fails to be created if rename  returns error and info_log is set to nullptr which leads to this assertion
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8632
      
      Test Plan: run the db_stress job locally
      
      Reviewed By: ajkr
      
      Differential Revision: D30167387
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 8d08c4c33e8f0cabd368bbb498d21b9de0660067
      052c24a6
  15. 07 8月, 2021 1 次提交