1. 16 8月, 2021 3 次提交
  2. 14 8月, 2021 1 次提交
    • B
      Improve MemPurge sampling (#8656) · e51be2c5
      Baptiste Lemaire 提交于
      Summary:
      Previously, the `MemPurge` sampling function was assessing whether a random entry from a memtable was garbage or not by simply querying the given memtable (see https://github.com/facebook/rocksdb/issues/8628 for more details).
      In this diff, I am updating the sampling function by querying not only the memtable the entry was drawn from, but also all subsequent memtables that have a greater memtable ID.
      I also added the size of the value for KV entries in the payload/useful payload estimates (which was also one of the reasons why sampling was not as good as mempurging all the time in terms of L0 SST files reduction).
      Once these changes were made, I was able to clean obsolete objects and functions from the `MemtableList` struct, and did a bit of cleanup everywhere.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8656
      
      Reviewed By: pdillinger
      
      Differential Revision: D30288583
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 7646a545ec56f4715949daa59ab5eee74540feb3
      e51be2c5
  3. 13 8月, 2021 1 次提交
    • M
      Code cleanup for trace replayer (#8652) · 74a652a4
      Merlin Mao 提交于
      Summary:
      - Remove extra `;` in trace_record.h
      - Remove some unnecessary `assert` in trace_record_handler.cc
      - Initialize `env_` after` exec_handler_` in `ReplayerImpl` to let db be asserted in creating the handler before getting `db->GetEnv()`.
      - Update history to include the new `TraceReader::Reset()`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8652
      
      Reviewed By: ajkr
      
      Differential Revision: D30276872
      
      Pulled By: autopear
      
      fbshipit-source-id: 476ee162e0f241490c6209307448343a5b326b37
      74a652a4
  4. 12 8月, 2021 4 次提交
    • M
      Make TraceRecord and Replayer public (#8611) · f58d2767
      Merlin Mao 提交于
      Summary:
      New public interfaces:
      `TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h".
      `Replayer`, available in `rocksdb/utilities/replayer.h`.
      
      User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file.
      
      Unit tests:
      - `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes.
      - `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611
      
      Reviewed By: ajkr
      
      Differential Revision: D30266329
      
      Pulled By: autopear
      
      fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8
      f58d2767
    • B
      Re-add retired mempurge flag definitions for legacy-options-file temporary support. (#8650) · a53563d8
      Baptiste Lemaire 提交于
      Summary:
      Current internal regression tests pass in an old option flag `experimental_allow_mempurge` to a more recently built db.
      This flag was retired and removed in a recent PR (https://github.com/facebook/rocksdb/issues/8628), and therefore, the following error comes up : `Failed: Invalid argument: Could not find option: : experimental_allow_mempurge`.
      In this PR, I reintroduce the two flags retired in https://github.com/facebook/rocksdb/issues/8628, `experimental_allow_mempurge` and `experimental_mempurge_policy` in `db_options.cc` and mark them both as `kDeprecated`.
      This is a temporary fix to save us time to find a long term solution, which hopefully will consist in ignoring options prefixed with `experimental_` that are no longer recognized.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8650
      
      Reviewed By: pdillinger
      
      Differential Revision: D30257307
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 35303655fd2dd9789fd9e3c450e9d8009f3c1f54
      a53563d8
    • P
      Update and enhance check_format_compatible.sh (#8651) · 6450e9fc
      Peter Dillinger 提交于
      Summary:
      The last few releases overlooked adding to this test. This
      change fixes that.
      
      This change also fixes the problem of older branches not understanding
      ROCKSDB_NO_FBCODE and referencing compilers no longer supported.
      During the test, build_detect_platform is patched to force no FBCODE
      compiler usage. (We should not need to update old branches perpetually.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8651
      
      Test Plan: local run reproduces regression described in https://github.com/facebook/rocksdb/issues/8650
      
      Reviewed By: jay-zhuang, zhichao-cao
      
      Differential Revision: D30261872
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 02b447d224d7e0eb8613c63185437ded146713bc
      6450e9fc
    • J
      Add suggestion for btrfs user to disable preallocation (#8646) · 87e23587
      Jay Zhuang 提交于
      Summary:
      Add comment for `options.allow_fallocate` that btrfs
      preallocated space are not freed and a suggestion to disable
      preallocation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8646
      
      Test Plan: No code change
      
      Reviewed By: ajkr
      
      Differential Revision: D30240050
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 75b7190bc8276ce8d8ac2d0cb9064b386cbf4768
      87e23587
  5. 11 8月, 2021 2 次提交
    • B
      Memtable sampling for mempurge heuristic. (#8628) · e3a96c48
      Baptiste Lemaire 提交于
      Summary:
      Changes the API of the MemPurge process: the `bool experimental_allow_mempurge` and `experimental_mempurge_policy` flags have been replaced by a `double experimental_mempurge_threshold` option.
      This change of API reflects another major change introduced in this PR: the MemPurgeDecider() function now works by sampling the memtables being flushed to estimate the overall amount of useful payload (payload minus the garbage), and then compare this useful payload estimate with the `double experimental_mempurge_threshold` value.
      Therefore, when the value of this flag is `0.0` (default value), mempurge is simply deactivated. On the other hand, a value of `DBL_MAX` would be equivalent to always going through a mempurge regardless of the garbage ratio estimate.
      At the moment, a `double experimental_mempurge_threshold` value else than 0.0 or `DBL_MAX` is opnly supported`with the `SkipList` memtable representation.
      Regarding the sampling, this PR includes the introduction of a `MemTable::UniqueRandomSample` function that collects (approximately) random entries from the memtable by using the new `SkipList::Iterator::RandomSeek()` under the hood, or by iterating through each memtable entry, depending on the target sample size and the total number of entries.
      The unit tests have been readapted to support this new API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8628
      
      Reviewed By: pdillinger
      
      Differential Revision: D30149315
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 1feef5390c95db6f4480ab4434716533d3947f27
      e3a96c48
    • L
      Attempt to deflake DBTestXactLogIterator.TransactionLogIteratorCorruptedLog (#8627) · f63331eb
      Levi Tamasi 提交于
      Summary:
      The patch attempts to deflake `DBTestXactLogIterator.TransactionLogIteratorCorruptedLog`
      by disabling file deletions while retrieving the list of WAL files and truncating the first WAL file.
      This is to prevent the `PurgeObsoleteFiles` call triggered by `GetSortedWalFiles` from
      invalidating the result of `GetSortedWalFiles`. The patch also cleans up the test case a bit
      and changes it to using `test::TruncateFile` instead of calling the `truncate` syscall directly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8627
      
      Test Plan: `make check`
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D30147002
      
      Pulled By: ltamasi
      
      fbshipit-source-id: db11072a4ad8900a2f859cb5294e22b1888c23f6
      f63331eb
  6. 10 8月, 2021 4 次提交
    • A
      Simplify GenericRateLimiter algorithm (#8602) · 82b81dc8
      Andrew Kryczka 提交于
      Summary:
      `GenericRateLimiter` slow path handles requests that cannot be satisfied
      immediately.  Such requests enter a queue, and their thread stays in `Request()`
      until they are granted or the rate limiter is stopped.  These threads are
      responsible for unblocking themselves.  The work to do so is split into two main
      duties.
      
      (1) Waiting for the next refill time.
      (2) Refilling the bytes and granting requests.
      
      Prior to this PR, the slow path logic involved a leader election algorithm to
      pick one thread to perform (1) followed by (2).  It elected the thread whose
      request was at the front of the highest priority non-empty queue since that
      request was most likely to be granted.  This algorithm was efficient in terms of
      reducing intermediate wakeups, which is a thread waking up only to resume
      waiting after finding its request is not granted.  However, the conceptual
      complexity of this algorithm was too high.  It took me a long time to draw a
      timeline to understand how it works for just one edge case yet there were so
      many.
      
      This PR drops the leader election to reduce conceptual complexity.  Now, the two
      duties can be performed by whichever thread acquires the lock first.  The risk
      of this change is increasing the number of intermediate wakeups, however, we
      took steps to mitigate that.
      
      - `wait_until_refill_pending_` flag ensures only one thread performs (1). This\
      prevents the thundering herd problem at the next refill time. The remaining\
      threads wait on their condition variable with an unbounded duration -- thus we\
      must remember to notify them to ensure forward progress.
      - (1) is typically done by a thread at the front of a queue. This is trivial\
      when the queues are initially empty as the first choice that arrives must be\
      the only entry in its queue. When queues are initially non-empty, we achieve\
      this by having (2) notify a thread at the front of a queue (preferring higher\
      priority) to perform the next duty.
      - We do not require any additional wakeup for (2). Typically it will just be\
      done by the thread that finished (1).
      
      Combined, the second and third bullet points above suggest the refill/granting
      will typically be done by a request at the front of its queue.  This is
      important because one wakeup is saved when a granted request happens to be in an
      already running thread.
      
      Note there are a few cases that still lead to intermediate wakeup, however.  The
      first two are existing issues that also apply to the old algorithm, however, the
      third (including both subpoints) is new.
      
      - No request may be granted (only possible when rate limit dynamically\
      decreases).
      - Requests from a different queue may be granted.
      - (2) may be run by a non-front request thread causing it to not be granted even\
      if some requests in that same queue are granted. It can happen for a couple\
      (unlikely) reasons.
        - A new request may sneak in and grab the lock at the refill time, before the\
      thread finishing (1) can wake up and grab it.
        - A new request may sneak in and grab the lock and execute (1) before (2)'s\
      chosen candidate can wake up and grab the lock. Then that non-front request\
      thread performing (1) can carry over to perform (2).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8602
      
      Test Plan:
      - Use existing tests. The edge cases listed in the comment are all performance\
      related; I could not really think of any related to correctness. The logic\
      looks the same whether a thread wakes up/finishes its work early/on-time/late,\
      or whether the thread is chosen vs. "steals" the work.
      - Verified write throughput and CPU overhead are basically the same with and\
        without this change, even in a rate limiter heavy workload:
      
      Test command:
      ```
      $ rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -benchmarks=fillrandom -num_multi_db=64 -num_low_pri_threads=64 -num_high_pri_threads=64 -write_buffer_size=262144 -target_file_size_base=262144 -max_bytes_for_level_base=1048576 -rate_limiter_bytes_per_sec=16777216 -key_size=24 -value_size=1000 -num=10000 -compression_type=none -rate_limiter_refill_period_us=1000
      ```
      
      Results before this PR:
      
      ```
      fillrandom   :     108.463 micros/op 9219 ops/sec;    9.0 MB/s
      7.40user 8.84system 1:26.20elapsed 18%CPU (0avgtext+0avgdata 256140maxresident)k
      ```
      
      Results after this PR:
      
      ```
      fillrandom   :     108.108 micros/op 9250 ops/sec;    9.0 MB/s
      7.45user 8.23system 1:26.68elapsed 18%CPU (0avgtext+0avgdata 255688maxresident)k
      ```
      
      Reviewed By: hx235
      
      Differential Revision: D30048013
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6741bba9d9dfbccab359806d725105817fef818b
      82b81dc8
    • L
      rocksdb: don't call LZ4_loadDictHC with null dictionary · a756fb9c
      Lucian Grijincu 提交于
      Summary: UBSAN revealed a pointer underflow when `LZ4HC_init_internal` is called with a null `start`.
      
      Reviewed By: ajkr
      
      Differential Revision: D30181874
      
      fbshipit-source-id: ca9bbac1a85c58782871d7f153af733b000cc66c
      a756fb9c
    • J
      Add an unittest for tiered storage universal compaction (#8631) · 61f83dfe
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8631
      
      Reviewed By: siying
      
      Differential Revision: D30200385
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0fa2bb15e74ff81762d767f234078e0fe0106c55
      61f83dfe
    • S
      Move old files to warm tier in FIFO compactions (#8310) · e7c24168
      sdong 提交于
      Summary:
      Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310
      
      Test Plan: Add several unit tests.
      
      Reviewed By: ajkr
      
      Differential Revision: D28493792
      
      fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501
      e7c24168
  7. 08 8月, 2021 1 次提交
    • A
      Fix db_stress failure (#8632) · 052c24a6
      Akanksha Mahajan 提交于
      Summary:
      FaultInjectionTestFS injects error in Rename operation. Because
      of injected error, info.log fails to be created if rename  returns error and info_log is set to nullptr which leads to this assertion
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8632
      
      Test Plan: run the db_stress job locally
      
      Reviewed By: ajkr
      
      Differential Revision: D30167387
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 8d08c4c33e8f0cabd368bbb498d21b9de0660067
      052c24a6
  8. 07 8月, 2021 7 次提交
  9. 06 8月, 2021 4 次提交
    • B
      Correct javadoc for Env#setBackgroundThreads(int) (#8576) · 8ca08178
      Brendan MacDonell 提交于
      Summary:
      By default, the low priority pool is not the flush pool, so calling `Env#setBackgroundThreads` without providing a priority will not do what the caller expected.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8576
      
      Reviewed By: ajkr
      
      Differential Revision: D29925154
      
      Pulled By: mrambacher
      
      fbshipit-source-id: cd7211fc374e7d9929a9b88ea0a5ba8134b76099
      8ca08178
    • M
      Make MergeOperator+CompactionFilter/Factory into Customizable Classes (#8481) · d057e832
      mrambacher 提交于
      Summary:
      - Changed MergeOperator, CompactionFilter, and CompactionFilterFactory into Customizable classes.
       - Added Options/Configurable/Object Registration for TTL and Cassandra variants
       - Changed the StringAppend MergeOperators to accept a string delimiter rather than a simple char.  Made the delimiter into a configurable option
       - Added tests for new functionality
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8481
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30136050
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 271d1772835935b6773abaf018ee71e42f9491af
      d057e832
    • A
      Dynamically configure BlockBasedTableOptions.prepopulate_block_cache (#8620) · fd207993
      Akanksha Mahajan 提交于
      Summary:
      Dynamically configure BlockBasedTableOptions.prepopulate_block_cache using DB::SetOptions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8620
      
      Test Plan: Added new unit test
      
      Reviewed By: anand1976
      
      Differential Revision: D30091319
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: fb586d1848a8dd525bba7b2f9eeac34f2fc6d82c
      fd207993
    • L
      Attempt to deflake ObsoleteFilesTest.DeleteObsoleteOptionsFile (#8624) · 9b25d26d
      Levi Tamasi 提交于
      Summary:
      We've been seeing occasional crashes on CI while inserting into the
      vectors in `ObsoleteFilesTest.DeleteObsoleteOptionsFile`. The crashes
      don't reproduce locally (could be either a race or an object lifecycle
      issue) but the good news is that the vectors in question are not really
      used for anything meaningful by the test. (The assertion about the sizes
      of the two vectors being equal is guaranteed to hold, since the two sync
      points where they are populated are right after each other.) The patch
      simply removes the vectors from the test, alongside the associated
      callbacks and sync points.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8624
      
      Test Plan: `make check`
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D30118485
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 0a4c3d06584e84cd2b1dcc212d274fa1b89cb647
      9b25d26d
  10. 05 8月, 2021 5 次提交
    • Y
      Update HISTORY for PR8585 (#8623) · b01a428d
      Yanqin Jin 提交于
      Summary:
      Update HISTORY.md for PR https://github.com/facebook/rocksdb/issues/8585 .
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8623
      
      Reviewed By: ltamasi
      
      Differential Revision: D30121910
      
      Pulled By: riversand963
      
      fbshipit-source-id: 525af43fad908a498f22ed4f934ec5cbf60e6d25
      b01a428d
    • A
      Do not attempt to rename non-existent info log (#8622) · a685a701
      Andrew Kryczka 提交于
      Summary:
      Previously we attempted to rename "LOG" to "LOG.old.*" without checking
      its existence first. "LOG" had no reason to exist in a new DB.
      
      Errors in renaming a non-existent "LOG" were swallowed via
      `PermitUncheckedError()` so things worked. However the storage service's
      error monitoring was detecting all these benign rename failures. So it
      is better to fix it. Also with this PR we can now distinguish rename failure
      for other reasons and return them.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8622
      
      Test Plan: new unit test
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D30115189
      
      Pulled By: ajkr
      
      fbshipit-source-id: e2f337ffb2bd171be0203172abc8e16e7809b170
      a685a701
    • A
      Fix clang failure (#8621) · a074d46a
      Akanksha Mahajan 提交于
      Summary:
      Fixed clang failure because of memory leak
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8621
      
      Test Plan: CircleCI clang job
      
      Reviewed By: pdillinger
      
      Differential Revision: D30114337
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 16572b9bcbaa053c2ab7bc1c344148d0e6f8039c
      a074d46a
    • A
      Remove corruption error injection in FaultInjectionTestFS (#8616) · c268859a
      anand76 提交于
      Summary:
      ```FaultInjectionTestFS``` injects various types of read errors in ```FileSystem``` APIs. One type of error is corruption errors, where data is intentionally corrupted or truncated. There is corresponding validation in db_stress to verify that an injected error results in a user visible Get/MultiGet error. However, for corruption errors, its hard to know when a corruption is supposed to be detected by the user request, due to prefetching and, in case of direct IO, padding. This results in false positives. So remove that functionality.
      
      Block checksum validation for Get/MultiGet is confined to ```BlockFetcher```, so we don't lose a lot by disabling this since its a small surface area to test.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8616
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30074422
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6a61fac18f95514c15364b75013799ddf83294df
      c268859a
    • H
      Improve rate limiter implementation's readability (#8596) · dbe3810c
      hx235 提交于
      Summary:
      Context:
      As need for new feature of resource management using RocksDB's rate limiter like [https://github.com/facebook/rocksdb/issues/8595](https://github.com/facebook/rocksdb/pull/8595) arises, it is about time to re-learn our rate limiter and make this learning process easier for others by improving its readability. The comment/assertion/one extra else-branch are added based on my best understanding toward the rate_limiter.cc and rate_limiter_test.cc up to date after giving it a hard read.
      - Add code comments/assertion/one extra else-branch (that is not affecting existing behavior, see PR comment) to describe how leader-election works under multi-thread settings in GenericRateLimiter::Request()
      - Add code comments to describe a non-obvious trick during clean-up of rate limiter destructor
      - Add code comments to explain more about the starvation being fixed in GenericRateLimiter::Refill() through partial byte-granting
      - Add code comments to the rate limiter's setup in a complicated unit test in rate_limiter_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8596
      
      Test Plan: - passed existing rate_limiter_test.cc
      
      Reviewed By: ajkr
      
      Differential Revision: D29982590
      
      Pulled By: hx235
      
      fbshipit-source-id: c3592986bb5b0c90d8229fe44f425251ec7e8a0a
      dbe3810c
  11. 04 8月, 2021 3 次提交
  12. 03 8月, 2021 5 次提交
    • B
      Fix db stress crash mempurge (#8604) · b2781522
      Baptiste Lemaire 提交于
      Summary:
      The db_stress crash was caused by a call to `IsFlushPending()` made by a stats function which triggered an `assert([false])`, which I didn't plan when I created the `trigger_flush` bool. It turns out that this bool variable is not useful: I created it because I thought the `imm_flush_needed` atomic bool would actually trigger a flush.
      It turns out that this bool is only checked in `IsFlushPending` - this is its only use - and a flush is triggered by either a background thread checking on the imm array, or by an explicit call to `SchedulePendingFlush` which creates a flush request, that is then added to a flush request queue.
      In this PR, I reverted the MemtableList::Add function to what it was before my changes.
      I tested the fix by running the exact command line that deterministically triggered the assert error (see below), which confirmed that this is where the error was coming from.
      I also run `db_crashtest.py whitebox` and `blackbox` for a couple hours locally before committing this PR.
      Experiment run:
      
      ```./db_stress --acquire_snapshot_one_in=0 --allow_concurrent_memtable_write=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=76.90653425292307 --bottommost_compression_type=disable --cache_index_and_filter_blocks=1 --cache_size=1048576 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=0 --compaction_ttl=2 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_blackbox --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --enable_compaction_filter=1 --enable_pipelined_write=0 --expected_values_path=/dev/shm/rocksdb/rocksdb_crashtest_expected --experimental_allow_mempurge=1 --experimental_mempurge_policy=kAlternate --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=14 --index_type=0 --iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --long_running_snapshots=1 --mark_for_compaction_one_file_in=10 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=32 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=1000 --prefix_size=-1 --prefixpercent=0 --progress_reports=0 --read_fault_one_in=0 --readpercent=60 --recycle_log_file_num=1 --reopen=20 --set_options_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --subcompactions=3 --sync=1 --sync_fault_injection=False --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --unpartitioned_pinning=3 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=0 --use_multiget=0 --use_ribbon_filter=1 --user_timestamp_size=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8604
      
      Reviewed By: pdillinger
      
      Differential Revision: D30047295
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: b9e379bfa3d6b9bd2b275725fb0bca4bd81a3dbe
      b2781522
    • M
      Revert checkpoint fix (#8607) · 4811115b
      Merlin Mao 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/pull/8572 looses custom types in the options file. Need more API changes to fix this issue. Revert this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8607
      
      Reviewed By: ajkr
      
      Differential Revision: D30058289
      
      Pulled By: autopear
      
      fbshipit-source-id: 78f5a154c0bf193e8441bae4a36fa79b95277fd4
      4811115b
    • L
      Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605) · 3f7e9298
      Levi Tamasi 提交于
      Summary:
      The `ColumnFamilyData::UnrefAndTryDelete` code currently on the trunk
      unlocks the DB mutex before destroying the `ThreadLocalPtr` holding
      the per-thread `SuperVersion` pointers when the only remaining reference
      is the back reference from `super_version_`. The idea behind this was to
      break the circular dependency between `ColumnFamilyData` and `SuperVersion`:
      when the penultimate reference goes away, `ColumnFamilyData` can clean up
      the `SuperVersion`, which can in turn clean up `ColumnFamilyData`. (Assuming there
      is a `SuperVersion` and it is not referenced by anything else.) However,
      unlocking the mutex throws a wrench in this plan by making it possible for another thread
      to jump in and take another reference to the `ColumnFamilyData`, keeping the
      object alive in a zombie `ThreadLocalPtr`-less state. This can cause issues like
      https://github.com/facebook/rocksdb/issues/8440 ,
      https://github.com/facebook/rocksdb/issues/8382 ,
      and might also explain the `was_last_ref` assertion failures from the `ColumnFamilySet`
      destructor we sometimes observe during close in our stress tests.
      
      Digging through the archives, this unlocking goes way back to 2014 (or earlier). The original
      rationale was that `SuperVersionUnrefHandle` used to lock the mutex so it can call
      `SuperVersion::Cleanup`; however, this logic turned out to be deadlock-prone.
      https://github.com/facebook/rocksdb/pull/3510 fixed the deadlock but left the
      unlocking in place. https://github.com/facebook/rocksdb/pull/6147 then introduced
      the circular dependency and associated cleanup logic described above (in order
      to enable iterators to keep the `ColumnFamilyData` for dropped column families alive),
      and moved the unlocking-relocking snippet to its present location in `UnrefAndTryDelete`.
      Finally, https://github.com/facebook/rocksdb/pull/7749 fixed a memory leak but
      apparently exacerbated the race by (otherwise correctly) switching to `UnrefAndTryDelete`
      in `SuperVersion::Cleanup`.
      
      The patch simply eliminates the unlocking and relocking, which has been unnecessary
      ever since https://github.com/facebook/rocksdb/issues/3510 made `SuperVersionUnrefHandle` lock-free.
      This closes the window during which another thread could increase the reference count,
      and hopefully fixes the issues above.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8605
      
      Test Plan: Ran `make check` and stress tests locally.
      
      Reviewed By: pdillinger
      
      Differential Revision: D30051035
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 8fe559e4b4ad69fc142579f8bc393ef525918528
      3f7e9298
    • Y
      Fix a issue with initializing blob header buffer (#8537) · 8e91bd90
      yangzaorang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8537
      
      Reviewed By: ajkr
      
      Differential Revision: D29838132
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: e3e78d5f85f240a1800ace417a8b634f74488e41
      8e91bd90
    • M
      Allow to use a string as a delimiter in StringAppendOperator (#8536) · 8f52972c
      Mikhail Golubev 提交于
      Summary:
      An arbitrary string can be used as a delimiter in StringAppend merge operator
      flavor. In particular, it allows using an empty string, combining binary values for
      the same key byte-to-byte one next to another.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8536
      
      Reviewed By: mrambacher
      
      Differential Revision: D29962120
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 4ef5d846a47835cf428a11200409e30e2dbffc4f
      8f52972c