1. 09 11月, 2021 1 次提交
  2. 19 10月, 2021 1 次提交
    • P
      Experimental support for SST unique IDs (#8990) · ad5325a7
      Peter Dillinger 提交于
      Summary:
      * New public header unique_id.h and function GetUniqueIdFromTableProperties
      which computes a universally unique identifier based on table properties
      of table files from recent RocksDB versions.
      * Generation of DB session IDs is refactored so that they are
      guaranteed unique in the lifetime of a process running RocksDB.
      (SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
      this enables SST unique IDs to be guaranteed unique among SSTs generated
      in a single process, and "better than random" between processes.
      See https://github.com/pdillinger/unique_id
      * In addition to public API producing 'external' unique IDs, there is a function
      for producing 'internal' unique IDs, with functions for converting between the
      two. In short, the external ID is "safe" for things people might do with it, and
      the internal ID enables more "power user" features for the future. Specifically,
      the external ID goes through a hashing layer so that any subset of bits in the
      external ID can be used as a hash of the full ID, while also preserving
      uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
      and on full 192 bits).
      
      Intended follow-up:
      * Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
      the third 64-bit value of the unique ID.)
      * Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
      
      Test Plan:
      Unit tests added, and checking of unique ids in stress test.
      NOTE in stress test we do not generate nearly enough files to thoroughly
      stress uniqueness, but the test trims off pieces of the ID to check for
      uniqueness so that we can infer (with some assumptions) stronger
      properties in the aggregate.
      
      Reviewed By: zhichao-cao, mrambacher
      
      Differential Revision: D31582865
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
      ad5325a7
  3. 08 10月, 2021 1 次提交
    • Z
      Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache (#8912) · 699f4504
      Zhichao Cao 提交于
      Summary:
      Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.
      
      Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8912
      
      Test Plan: add new tests to lru_cache_test and pass make check.
      
      Reviewed By: pdillinger
      
      Differential Revision: D31452871
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48
      699f4504
  4. 29 9月, 2021 1 次提交
    • A
      Refactor expected state in stress/crash test (#8913) · 559943cd
      Andrew Kryczka 提交于
      Summary:
      This is a precursor refactoring to enable an upcoming feature: persistence failure correctness testing.
      
      - Changed `--expected_values_path` to `--expected_values_dir` and migrated "db_crashtest.py" to use the new flag. For persistence failure correctness testing there are multiple possible correct states since unsynced data is allowed to be dropped. Making it possible to restore all these possible correct states will eventually involve files containing snapshots of expected values and DB trace files.
      - The expected values directory is managed by an `ExpectedStateManager` instance. Managing expected state files is separated out of `SharedState` to prevent `SharedState` from becoming too complex when the new files and features (snapshotting, tracing, and restoring) are introduced.
      - Migrated expected values file access/management out of `SharedState` into a separate class called `ExpectedState`. This is not exposed directly to the test but rather the `ExpectedState` for the latest values file is accessed via a pass-through API on `ExpectedStateManager`. This forces the test to always access the single latest `ExpectedState`.
      - Changed the initialization of the latest expected values file to use a tempfile followed by rename, and also add cleanup logic for possible stranded tempfiles.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8913
      
      Test Plan:
      run in several ways; try to make sure it's not obviously broken.
      
      - crashtest blackbox without TEST_TMPDIR
      ```
      $ python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none
      ```
      - crashtest blackbox with TEST_TMPDIR
      ```
      $ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none
      ```
      - crashtest whitebox with TEST_TMPDIR
      ```
      $ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py whitebox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none --random_kill_odd=88887
      ```
      - db_stress without expected_values_dir
      ```
      $ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true
      ```
      - db_stress with expected_values_dir and manual corruption
      ```
      $ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true --expected_values_dir=./
      // modify one byte in "./LATEST.state"
      $ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=false --expected_values_dir=./
      ...
      Verification failed for column family 0 key 0000000000000000 (0): Value not found: NotFound:
      ...
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D30921951
      
      Pulled By: ajkr
      
      fbshipit-source-id: babfe218062e55d018c9b046536c0289fb78f41c
      559943cd
  5. 28 9月, 2021 1 次提交
  6. 08 9月, 2021 1 次提交
    • P
      Improve support for using regexes (#8740) · 0ef88538
      Peter Dillinger 提交于
      Summary:
      * Consolidate use of std::regex for testing to testharness.cc, to
      minimize Facebook linters constantly flagging uses in non-production
      code.
      * Improve syntax and error messages for asserting some string matches a
      regex in tests.
      * Add a public Regex wrapper class to encapsulate existing usage in
      ObjectRegistry.
      * Remove unnecessary include <regex>
      * Put warnings that use of Regex in production code could cause bad
      performance or stack overflow.
      
      Intended follow-up work:
      * Replace std::regex with another underlying implementation like RE2
      * Improve ObjectRegistry interface in terms of possibly confusing literal
      string matching vs. regex and in terms of reporting invalid regex.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8740
      
      Test Plan:
      tests updated, basic unit test for public Regex, and some manual
      testing of temporary changes to see example error messages:
      
      utilities/backupable/backupable_db_test.cc:917: Failure
      000010_1162373755_138626.blob (child.name)
      does not match regex
      [0-9]+_[0-9]+_[0-9]+[.]blobHAHAHA (pattern)
      
      db/db_basic_test.cc:74: Failure
      R3SHSBA8C4U0CIMV2ZB0 (sid3)
      does not match regex [0-9A-Z]{20}HAHAHA
      
      Reviewed By: mrambacher
      
      Differential Revision: D30706246
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ba845e8f563ccad39bdb58f44f04e9da8f78c3fd
      0ef88538
  7. 31 8月, 2021 1 次提交
    • P
      Built-in support for generating unique IDs, bug fix (#8708) · 13ded694
      Peter Dillinger 提交于
      Summary:
      Env::GenerateUniqueId() works fine on Windows and on POSIX
      where /proc/sys/kernel/random/uuid exists. Our other implementation is
      flawed and easily produces collision in a new multi-threaded test.
      As we rely more heavily on DB session ID uniqueness, this becomes a
      serious issue.
      
      This change combines several individually suitable entropy sources
      for reliable generation of random unique IDs, with goal of uniqueness
      and portability, not cryptographic strength nor maximum speed.
      
      Specifically:
      * Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
      rather than in Env implementation details. Callers are now told whether
      the operation fails or succeeds.
      * Adds an internal API GenerateRawUniqueId for generating high-quality
      128-bit unique identifiers, by combining entropy from three "tracks":
        * Lots of info from default Env like time, process id, and hostname.
        * std::random_device
        * port::GenerateRfcUuid (when working)
      * Built-in implementations of Env::GenerateUniqueId() will now always
      produce an RFC 4122 UUID string, either from platform-specific API or
      by converting the output of GenerateRawUniqueId.
      
      DB session IDs now use GenerateRawUniqueId while DB IDs (not as
      critical) try to use port::GenerateRfcUuid but fall back on
      GenerateRawUniqueId with conversion to an RFC 4122 UUID.
      
      GenerateRawUniqueId is declared and defined under env/ rather than util/
      or even port/ because of the Env dependency.
      
      Likely follow-up: enhance GenerateRawUniqueId to be faster after the
      first call and to guarantee uniqueness within the lifetime of a single
      process (imparting the same property onto DB session IDs).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708
      
      Test Plan:
      A new mini-stress test in env_test checks the various public
      and internal APIs for uniqueness, including each track of
      GenerateRawUniqueId individually. We can't hope to verify anywhere close
      to 128 bits of entropy, but it can at least detect flaws as bad as the
      old code. Serial execution of the new tests takes about 350 ms on
      my machine.
      
      Reviewed By: zhichao-cao, mrambacher
      
      Differential Revision: D30563780
      
      Pulled By: pdillinger
      
      fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
      13ded694
  8. 25 8月, 2021 1 次提交
    • H
      Refactor WriteBufferManager::CacheRep into CacheReservationManager (#8506) · 74cfe7db
      Hui Xiao 提交于
      Summary:
      Context:
      To help cap various memory usage by a single limit of the block cache capacity, we charge the memory usage through inserting/releasing dummy entries in the block cache. CacheReservationManager is such a class (non thread-safe) responsible for  inserting/removing dummy entries to reserve cache space for memory used by the class user.
      
      - Refactored the inner private class CacheRep of WriteBufferManager into public CacheReservationManager class for reusability such as for https://github.com/facebook/rocksdb/pull/8428
      
      - Encapsulated implementation details of cache key generation and dummy entries insertion/release in cache reservation as discussed in https://github.com/facebook/rocksdb/pull/8506#discussion_r666550838
      
      - Consolidated increase/decrease cache reservation into one API - UpdateCacheReservation.
      
      - Adjusted the previous dummy entry release algorithm in decreasing cache reservation to be loop-releasing dummy entries to stay symmetric to dummy entry insertion algorithm
      
      - Made the previous dummy entry release algorithm in delayed decrease mode more aggressive for better decreasing cache reservation when memory used is less likely to increase back.
      
        Previously, the algorithms only release 1 dummy entries when new_mem_used < 3/4 * cache_allocated_size_ and cache_allocated_size_ - kSizeDummyEntry > new_mem_used.
      Now, the algorithms loop-releases as many dummy entries as possible when new_mem_used < 3/4 * cache_allocated_size_.
      
      - Updated WriteBufferManager's test cases to adapt to changes on the release algorithm mentioned above and left comment for some test cases for clarity
      
      - Replaced the previous cache key prefix generation (utilizing object address related to the cache client) with one that utilizes Cache->NewID() to prevent cache-key collision among dummy entry clients sharing the same cache.
      
        The specific collision we are preventing happens when the object address is reused for a new cache-key prefix while the old cache-key using that same object address in its prefix still exists in the cache. This could happen due to that, under LRU cache policy, there is a possible delay in releasing a cache entry after the cache client object owning that cache entry get deallocated. In this case, the object address related to the cache client object can get reused for other client object to generate a new cache-key prefix.
      
        This prefix generation can be made obsolete after Peter's unification of all the code generating cache key, mentioned in https://github.com/facebook/rocksdb/pull/8506#discussion_r667265255
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8506
      
      Test Plan:
      - Passing the added unit tests cache_reservation_manager_test.cc
      - Passing existing and adjusted write_buffer_manager_test.cc
      
      Reviewed By: ajkr
      
      Differential Revision: D29644135
      
      Pulled By: hx235
      
      fbshipit-source-id: 0fc93fbfe4a40bb41be85c314f8f2bafa8b741f7
      74cfe7db
  9. 19 8月, 2021 1 次提交
    • M
      Allow Replayer to report the results of TraceRecords. (#8657) · d10801e9
      Merlin Mao 提交于
      Summary:
      `Replayer::Execute()` can directly returns the result (e.g, request latency, DB::Get() return code, returned value, etc.)
      `Replayer::Replay()` reports the results via a callback function.
      
      New interface:
      `TraceRecordResult` in "rocksdb/trace_record_result.h".
      
      `DBTest2.TraceAndReplay` and `DBTest2.TraceAndManualReplay` are updated accordingly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8657
      
      Reviewed By: ajkr
      
      Differential Revision: D30290216
      
      Pulled By: autopear
      
      fbshipit-source-id: 3c8d4e6b180ec743de1a9d9dcaee86064c74f0d6
      d10801e9
  10. 12 8月, 2021 1 次提交
    • M
      Make TraceRecord and Replayer public (#8611) · f58d2767
      Merlin Mao 提交于
      Summary:
      New public interfaces:
      `TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h".
      `Replayer`, available in `rocksdb/utilities/replayer.h`.
      
      User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file.
      
      Unit tests:
      - `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes.
      - `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611
      
      Reviewed By: ajkr
      
      Differential Revision: D30266329
      
      Pulled By: autopear
      
      fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8
      f58d2767
  11. 06 8月, 2021 1 次提交
    • M
      Make MergeOperator+CompactionFilter/Factory into Customizable Classes (#8481) · d057e832
      mrambacher 提交于
      Summary:
      - Changed MergeOperator, CompactionFilter, and CompactionFilterFactory into Customizable classes.
       - Added Options/Configurable/Object Registration for TTL and Cassandra variants
       - Changed the StringAppend MergeOperators to accept a string delimiter rather than a simple char.  Made the delimiter into a configurable option
       - Added tests for new functionality
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8481
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30136050
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 271d1772835935b6773abaf018ee71e42f9491af
      d057e832
  12. 09 7月, 2021 1 次提交
    • J
      Add micro-benchmark support (#8493) · 5dd18a8d
      Jay Zhuang 提交于
      Summary:
      Add google benchmark for microbench.
      Add ribbon_bench for benchmark ribbon filter vs. other filters.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8493
      
      Test Plan:
      added test to CI
      To run the benchmark on devhost:
      Install benchmark: `$ sudo dnf install google-benchmark-devel`
      Build and run:
      `$ ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make microbench`
      or with cmake:
      `$ mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_BENCHMARK=1 && make microbench`
      
      Reviewed By: pdillinger
      
      Differential Revision: D29589649
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 8fed13b562bef4472f161ecacec1ab6b18911dff
      5dd18a8d
  13. 24 6月, 2021 1 次提交
  14. 22 6月, 2021 1 次提交
    • L
      Add a class for measuring the amount of garbage generated during compaction (#8426) · 065bea15
      Levi Tamasi 提交于
      Summary:
      This is part of an alternative approach to https://github.com/facebook/rocksdb/issues/8316.
      Unlike that approach, this one relies on key-values getting processed one by one
      during compaction, and does not involve persistence.
      
      Specifically, the patch adds a class `BlobGarbageMeter` that can track the number
      and total size of blobs in a (sub)compaction's input and output on a per-blob file
      basis. This information can then be used to compute the amount of additional
      garbage generated by the compaction for any given blob file by subtracting the
      "outflow" from the "inflow."
      
      Note: this patch only adds `BlobGarbageMeter` and associated unit tests. I plan to
      hook up this class to the input and output of `CompactionIterator` in a subsequent PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8426
      
      Test Plan: `make check`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D29242250
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 597e50ad556540e413a50e804ba15bc044d809bb
      065bea15
  15. 11 6月, 2021 1 次提交
  16. 10 6月, 2021 1 次提交
    • L
      Add a clipping internal iterator (#8327) · db325a59
      Levi Tamasi 提交于
      Summary:
      Logically, subcompactions process a key range [start, end); however, the way
      this is currently implemented is that the `CompactionIterator` for any given
      subcompaction keeps processing key-values until it actually outputs a key that
      is out of range, which is then discarded. Instead of doing this, the patch
      introduces a new type of internal iterator called `ClippingIterator` which wraps
      another internal iterator and "clips" its range of key-values so that any KVs
      returned are strictly in the [start, end) interval. This does eliminate a (minor)
      inefficiency by stopping processing in subcompactions exactly at the limit;
      however, the main motivation is related to BlobDB: namely, we need this to be
      able to measure the amount of garbage generated by a subcompaction
      precisely and prevent off-by-one errors.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8327
      
      Test Plan: `make check`
      
      Reviewed By: siying
      
      Differential Revision: D28761541
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ee0e7229f04edabbc7bed5adb51771fbdc287f69
      db325a59
  17. 20 5月, 2021 3 次提交
    • J
      Add remote compaction public API (#8300) · 3786181a
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8300
      
      Reviewed By: ajkr
      
      Differential Revision: D28464726
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 49e9f4fb791808a6cbf39a7b1a331373f645fc5e
      3786181a
    • P
      Use deleters to label cache entries and collect stats (#8297) · 311a544c
      Peter Dillinger 提交于
      Summary:
      This change gathers and publishes statistics about the
      kinds of items in block cache. This is especially important for
      profiling relative usage of cache by index vs. filter vs. data blocks.
      It works by iterating over the cache during periodic stats dump
      (InternalStats, stats_dump_period_sec) or on demand when
      DB::Get(Map)Property(kBlockCacheEntryStats), except that for
      efficiency and sharing among column families, saved data from
      the last scan is used when the data is not considered too old.
      
      The new information can be seen in info LOG, for example:
      
          Block cache LRUCache@0x7fca62229330 capacity: 95.37 MB collections: 8 last_copies: 0 last_secs: 0.00178 secs_since: 0
          Block cache entry stats(count,size,portion): DataBlock(7092,28.24 MB,29.6136%) FilterBlock(215,867.90 KB,0.888728%) FilterMetaBlock(2,5.31 KB,0.00544%) IndexBlock(217,180.11 KB,0.184432%) WriteBuffer(1,256.00 KB,0.262144%) Misc(1,0.00 KB,0%)
      
      And also through DB::GetProperty and GetMapProperty (here using
      ldb just for demonstration):
      
          $ ./ldb --db=/dev/shm/dbbench/ get_property rocksdb.block-cache-entry-stats
          rocksdb.block-cache-entry-stats.bytes.data-block: 0
          rocksdb.block-cache-entry-stats.bytes.deprecated-filter-block: 0
          rocksdb.block-cache-entry-stats.bytes.filter-block: 0
          rocksdb.block-cache-entry-stats.bytes.filter-meta-block: 0
          rocksdb.block-cache-entry-stats.bytes.index-block: 178992
          rocksdb.block-cache-entry-stats.bytes.misc: 0
          rocksdb.block-cache-entry-stats.bytes.other-block: 0
          rocksdb.block-cache-entry-stats.bytes.write-buffer: 0
          rocksdb.block-cache-entry-stats.capacity: 8388608
          rocksdb.block-cache-entry-stats.count.data-block: 0
          rocksdb.block-cache-entry-stats.count.deprecated-filter-block: 0
          rocksdb.block-cache-entry-stats.count.filter-block: 0
          rocksdb.block-cache-entry-stats.count.filter-meta-block: 0
          rocksdb.block-cache-entry-stats.count.index-block: 215
          rocksdb.block-cache-entry-stats.count.misc: 1
          rocksdb.block-cache-entry-stats.count.other-block: 0
          rocksdb.block-cache-entry-stats.count.write-buffer: 0
          rocksdb.block-cache-entry-stats.id: LRUCache@0x7f3636661290
          rocksdb.block-cache-entry-stats.percent.data-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.deprecated-filter-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.filter-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.filter-meta-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.index-block: 2.133751
          rocksdb.block-cache-entry-stats.percent.misc: 0.000000
          rocksdb.block-cache-entry-stats.percent.other-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.write-buffer: 0.000000
          rocksdb.block-cache-entry-stats.secs_for_last_collection: 0.000052
          rocksdb.block-cache-entry-stats.secs_since_last_collection: 0
      
      Solution detail - We need some way to flag what kind of blocks each
      entry belongs to, preferably without changing the Cache API.
      One of the complications is that Cache is a general interface that could
      have other users that don't adhere to whichever convention we decide
      on for keys and values. Or we would pay for an extra field in the Handle
      that would only be used for this purpose.
      
      This change uses a back-door approach, the deleter, to indicate the
      "role" of a Cache entry (in addition to the value type, implicitly).
      This has the added benefit of ensuring proper code origin whenever we
      recognize a particular role for a cache entry; if the entry came from
      some other part of the code, it will use an unrecognized deleter, which
      we simply attribute to the "Misc" role.
      
      An internal API makes for simple instantiation and automatic
      registration of Cache deleters for a given value type and "role".
      
      Another internal API, CacheEntryStatsCollector, solves the problem of
      caching the results of a scan and sharing them, to ensure scans are
      neither excessive nor redundant so as not to harm Cache performance.
      
      Because code is added to BlocklikeTraits, it is pulled out of
      block_based_table_reader.cc into its own file.
      
      This is a reformulation of https://github.com/facebook/rocksdb/issues/8276, without the type checking option
      (could still be added), and with actual stat gathering.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8297
      
      Test Plan: manual testing with db_bench, and a couple of basic unit tests
      
      Reviewed By: ltamasi
      
      Differential Revision: D28488721
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 472f524a9691b5afb107934be2d41d84f2b129fb
      311a544c
    • A
      Allow cache_bench/db_bench to use a custom secondary cache (#8312) · 13232e11
      anand76 提交于
      Summary:
      This PR adds a ```-secondary_cache_uri``` option to the cache_bench and db_bench tools to allow the user to specify a custom secondary cache URI. The object registry is used to create an instance of the ```SecondaryCache``` object of the type specified in the URI.
      
      The main cache_bench code is packaged into a separate library, similar to db_bench.
      
      An example invocation of db_bench with a secondary cache URI -
      ```db_bench --env_uri=ws://ws.flash_sandbox.vll1_2/ -db=anand/nvm_cache_2 -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=67108864 -cache_index_and_filter_blocks=true  -secondary_cache_uri='cachelibwrapper://filename=/home/anand76/nvm_cache/cache_file;size=2147483648;regionSize=16777216;admPolicy=random;admProbability=1.0;volatileSize=8388608;bktPower=20;lockPower=12' -partition_index_and_filters=true -duration=1800```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8312
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28544325
      
      Pulled By: anand1976
      
      fbshipit-source-id: 8f209b9af900c459dc42daa7a610d5f00176eeed
      13232e11
  18. 04 5月, 2021 1 次提交
    • S
      Hint temperature of bottommost level files to FileSystem (#8222) · c3ff14e2
      sdong 提交于
      Summary:
      As the first part of the effort of having placing different files on different storage types, this change introduces several things:
      (1) An experimental interface in FileSystem that specify temperature to a new file created.
      (2) A test FileSystemWrapper,  SimulatedHybridFileSystem, that simulates HDD for a file of "warm" temperature.
      (3) A simple experimental feature ColumnFamilyOptions.bottommost_temperature. RocksDB would pass this value to FileSystem when creating any bottommost file.
      (4) A db_bench parameter that applies the (2) and (3) to db_bench.
      
      The motivation of the change is to introduce minimal changes that allow us to evolve tiered storage development.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8222
      
      Test Plan:
      ./db_bench --benchmarks=fillrandom --write_buffer_size=2000000 -max_bytes_for_level_base=20000000  -level_compaction_dynamic_level_bytes --reads=100 -compaction_readahead_size=20000000 --reads=100000 -num=10000000
      
      followed by
      
      ./db_bench --benchmarks=readrandom,stats --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -simulate_hybrid_fs_file=/tmp/warm_file_list -level_compaction_dynamic_level_bytes -compaction_readahead_size=20000000 --reads=500 --threads=16 -use_existing_db --num=10000000
      
      and see results as expected.
      
      Reviewed By: ajkr
      
      Differential Revision: D28003028
      
      fbshipit-source-id: 4724896d5205730227ba2f17c3fecb11261744ce
      c3ff14e2
  19. 22 4月, 2021 1 次提交
    • A
      Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) · 596e9008
      Akanksha Mahajan 提交于
      Summary:
      When WriteBufferManager is shared across DBs and column families
      to maintain memory usage under a limit, OOMs have been observed when flush cannot
      finish but writes continuously insert to memtables.
      In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
      this change will stall incoming writers until flush is completed and memory_usage
      drops.
      
      Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
      (memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.
      
      DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
      (which adds dummy stall object to the writer's queue).
      Then DB is blocked on a state State::Blocked (current write doesn't go
      through). WBStallInterface object maintained by every DB instance is added to the queue of
      WriteBufferManager.
      
      If multiple DBs tries to write during this stall, they will also be
      blocked when check WriteBufferManager::ShouldStall() returns true.
      
      End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
      waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
      to complete and avoid continous stalling if memory usage remains close to buffer_size.
      
      WriterBufferManager::EndWriteStall() is called,
      which removes all instances from its queue and signal them to continue.
      Their state is changed to State::Running and they are unblocked. DBImpl
      then signal all incoming writers of that DB to continue by calling
      write_thread_.EndWriteStall() (which removes dummy stall object from the
      queue).
      
      DB instance creates WBMStallInterface which is an interface to block and
      signal DBs during stall.
      When DB needs to be blocked or signalled by WriteBufferManager,
      state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898
      
      Test Plan: Added a new test db/db_write_buffer_manager_test.cc
      
      Reviewed By: anand1976
      
      Differential Revision: D26093227
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae
      596e9008
  20. 13 4月, 2021 1 次提交
    • X
      Add util/crc32c_arm64.cc to TARGETS (#8168) · 8972dd1f
      Xavier Deguillard 提交于
      Summary:
      When compiling RocksDB with Buck for ARM64, the linker complains about missing crc32 symbols that are defined in the crc32c_arm64.cc file. Since this file wasn't included in the build this is totally expected
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8168
      
      Test Plan:
      The following no longer fails to link rocksdb:
        buck build mode/mac-xcode //eden/fs/service:edenfs#macosx-arm64
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D27664627
      
      Pulled By: xavierd
      
      fbshipit-source-id: fb9d7a538599ee7a08882f87628731de6e641f8d
      8972dd1f
  21. 07 4月, 2021 1 次提交
    • P
      Make backups openable as read-only DBs (#8142) · 879357fd
      Peter Dillinger 提交于
      Summary:
      A current limitation of backups is that you don't know the
      exact database state of when the backup was taken. With this new
      feature, you can at least inspect the backup's DB state without
      restoring it by opening it as a read-only DB.
      
      Rather than add something like OpenAsReadOnlyDB to the BackupEngine API,
      which would inhibit opening stackable DB implementations read-only
      (if/when their APIs support it), we instead provide a DB name and Env
      that can be used to open as a read-only DB.
      
      Possible follow-up work:
      
      * Add a version of GetBackupInfo for a single backup.
      * Let CreateNewBackup return the BackupID of the newly-created backup.
      
      Implementation details:
      
      Refactored ChrootFileSystem to split off new base class RemapFileSystem,
      which allows more general remapping of files. We use this base class to
      implement BackupEngineImpl::RemapSharedFileSystem.
      
      To minimize API impact, I decided to just add these fields `name_for_open`
      and `env_for_open` to those set by GetBackupInfo when
      include_file_details=true. Creating the RemapSharedFileSystem adds a bit
      to the memory consumption, perhaps unnecessarily in some cases, but this
      has been mitigated by (a) only initialize the RemapSharedFileSystem
      lazily when GetBackupInfo with include_file_details=true is called, and
      (b) using the existing `shared_ptr<FileInfo>` objects to hold most of the
      mapping data.
      
      To enhance API safety, RemapSharedFileSystem is wrapped by new
      ReadOnlyFileSystem which rejects any attempts to write. This uncovered a
      couple of places in which DB::OpenForReadOnly would write to the
      filesystem, so I fixed these. Added a release note because this affects
      logging.
      
      Additional minor refactoring in backupable_db.cc to support the new
      functionality.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8142
      
      Test Plan:
      new test (run with ASAN and UBSAN), added to stress test and
      ran it for a while with amplified backup_one_in
      
      Reviewed By: ajkr
      
      Differential Revision: D27535408
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 04666d310aa0261ef6b2385c43ca793ce1dfd148
      879357fd
  22. 05 4月, 2021 1 次提交
    • P
      Make tests "parallel" and "passing ASC" by default (#8146) · bd7ddf58
      Peter Dillinger 提交于
      Summary:
      New tests should by default be expected to be parallelizeable
      and passing with ASSERT_STATUS_CHECKED. Thus, I'm changing those two
      lists to exclusions rather than inclusions.
      
      For the set of exclusions, I only listed things that currently failed
      for me when attempting not to exclude, or had some other documented
      reason. This marks many more tests as "parallel," which will potentially
      cause some failures from self-interference, but we can address those as
      they are discovered.
      
      Also changed CircleCI ASC test to be parallelized; the easy way to do
      that is to exclude building tests that don't pass ASC, which is now a
      small set.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8146
      
      Test Plan: Watch CI, etc.
      
      Reviewed By: riversand963
      
      Differential Revision: D27542782
      
      Pulled By: pdillinger
      
      fbshipit-source-id: bdd74bcd912a963ee33f3fc0d2cad2567dc7740f
      bd7ddf58
  23. 30 3月, 2021 1 次提交
  24. 24 3月, 2021 1 次提交
  25. 18 3月, 2021 1 次提交
  26. 16 3月, 2021 1 次提交
  27. 10 3月, 2021 1 次提交
    • P
      Refactor: add LineFileReader and Status::MustCheck (#8026) · 4b18c46d
      Peter Dillinger 提交于
      Summary:
      Removed confusing, awkward, and undocumented internal API
      ReadOneLine and replaced with very simple LineFileReader.
      
      In refactoring backupable_db.cc, this has the side benefit of
      removing the arbitrary cap on the size of backup metadata files.
      
      Also added Status::MustCheck to make it easy to mark a Status as
      "must check." Using this, I can ensure that after
      LineFileReader::ReadLine returns false the caller checks GetStatus().
      
      Also removed some excessive conditional compilation in status.h
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8026
      
      Test Plan: added unit test, and running tests with ASSERT_STATUS_CHECKED
      
      Reviewed By: mrambacher
      
      Differential Revision: D26831687
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ef749c265a7a26bb13cd44f6f0f97db2955f6f0f
      4b18c46d
  28. 27 2月, 2021 1 次提交
    • P
      Refine Ribbon configuration, improve testing, add Homogeneous (#7879) · a8b3b9a2
      Peter Dillinger 提交于
      Summary:
      This change only affects non-schema-critical aspects of the production candidate Ribbon filter. Specifically, it refines choice of internal configuration parameters based on inputs. The changes are minor enough that the schema tests in bloom_test, some of which depend on this, are unaffected. There are also some minor optimizations and refactorings.
      
      This would be a schema change for "smash" Ribbon, to fix some known issues with small filters, but "smash" Ribbon is not accessible in public APIs. Unit test CompactnessAndBacktrackAndFpRate updated to test small and medium-large filters. Run with --thoroughness=100 or so for much better detection power (not appropriate for continuous regression testing).
      
      Homogenous Ribbon:
      This change adds internally a Ribbon filter variant we call Homogeneous Ribbon, in collaboration with Stefan Walzer. The expected "result" value for every key is zero, instead of computed from a hash. Entropy for queries not to be false positives comes from free variables ("overhead") in the solution structure, which are populated pseudorandomly. Construction is slightly faster for not tracking result values, and never fails. Instead, FP rate can jump up whenever and whereever entries are packed too tightly. For small structures, we can choose overhead to make this FP rate jump unlikely, as seen in updated unit test CompactnessAndBacktrackAndFpRate.
      
      Unlike standard Ribbon, Homogeneous Ribbon seems to scale to arbitrary number of keys when accepting an FP rate penalty for small pockets of high FP rate in the structure. For example, 64-bit ribbon with 8 solution columns and 10% allocated space overhead for slots seems to achieve about 10.5% space overhead vs. information-theoretic minimum based on its observed FP rate with expected pockets of degradation. (FP rate is close to 1/256.) If targeting a higher FP rate with fewer solution columns, Homogeneous Ribbon can be even more space efficient, because the penalty from degradation is relatively smaller. If targeting a lower FP rate, Homogeneous Ribbon is less space efficient, as more allocated overhead is needed to keep the FP rate impact of degradation relatively under control. The new OptimizeHomogAtScale tool in ribbon_test helps to find these optimal allocation overheads for different numbers of solution columns. And Ribbon widths, with 128-bit Ribbon apparently cutting space overheads in half vs. 64-bit.
      
      Other misc item specifics:
      * Ribbon APIs in util/ribbon_config.h now provide configuration data for not just 5% construction failure rate (95% success), but also 50% and 0.1%.
        * Note that the Ribbon structure does not exhibit "threshold" behavior as standard Xor filter does, so there is a roughly fixed space penalty to cut construction failure rate in half. Thus, there isn't really an "almost sure" setting.
        * Although we can extrapolate settings for large filters, we don't have a good formula for configuring smaller filters (< 2^17 slots or so), and efforts to summarize with a formula have failed. Thus, small data is hard-coded from updated FindOccupancy tool.
      * Enhances ApproximateNumEntries for public API Ribbon using more precise data (new API GetNumToAdd), thus a more accurate but not perfect reversal of CalculateSpace. (bloom_test updated to expect the greater precision)
      * Move EndianSwapValue from coding.h to coding_lean.h to keep Ribbon code easily transferable from RocksDB
      * Add some missing 'const' to member functions
      * Small optimization to 128-bit BitParity
      * Small refactoring of BandingStorage in ribbon_alg.h to support Homogeneous Ribbon
      * CompactnessAndBacktrackAndFpRate now has an "expand" test: on construction failure, a possible alternative to re-seeding hash functions is simply to increase the number of slots (allocated space overhead) and try again with essentially the same hash values. (Start locations will be different roundings of the same scaled hash values--because fastrange not mod.) This seems to be as effective or more effective than re-seeding, as long as we increase the number of slots (m) by roughly m += m/w where w is the Ribbon width. This way, there is effectively an expansion by one slot for each ribbon-width window in the banding. (This approach assumes that getting "bad data" from your hash function is as unlikely as it naturally should be, e.g. no adversary.)
      * 32-bit and 16-bit Ribbon configurations are added to ribbon_test for understanding their behavior, e.g. with FindOccupancy. They are not considered useful at this time and not tested with CompactnessAndBacktrackAndFpRate.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7879
      
      Test Plan: unit test updates included
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26371245
      
      Pulled By: pdillinger
      
      fbshipit-source-id: da6600d90a3785b99ad17a88b2a3027710b4ea3a
      a8b3b9a2
  29. 26 2月, 2021 1 次提交
    • Y
      Compaction filter support for (new) BlobDB (#7974) · cef4a6c4
      Yanqin Jin 提交于
      Summary:
      Allow applications to implement a custom compaction filter and pass it to BlobDB.
      
      The compaction filter's custom logic can operate on blobs.
      To do so, application needs to subclass `CompactionFilter` abstract class and implement `FilterV2()` method.
      Optionally, a method called `ShouldFilterBlobByKey()` can be implemented if application's custom logic rely solely
      on the key to make a decision without reading the blob, thus saving extra IO. Examples can be found in
      db/blob/db_blob_compaction_test.cc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7974
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D26509280
      
      Pulled By: riversand963
      
      fbshipit-source-id: 59f9ae5614c4359de32f4f2b16684193cc537b39
      cef4a6c4
  30. 23 2月, 2021 1 次提交
  31. 30 1月, 2021 1 次提交
    • A
      Integrity protection for live updates to WriteBatch (#7748) · 78ee8564
      Andrew Kryczka 提交于
      Summary:
      This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
      
      The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
      
      When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
      
      Test Plan:
      - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
      - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
      - [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
      
      Reviewed By: pdillinger
      
      Differential Revision: D25754492
      
      Pulled By: ajkr
      
      fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
      78ee8564
  32. 26 1月, 2021 1 次提交
    • M
      Add a SystemClock class to capture the time functions of an Env (#7858) · 12f11373
      mrambacher 提交于
      Summary:
      Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
      
      Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
      
      There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
      
      Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
      
      Reviewed By: pdillinger
      
      Differential Revision: D26006406
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
      12f11373
  33. 23 12月, 2020 1 次提交
    • S
      Range Locking: Implementation of range locking (#7506) · daab7603
      Sergei Petrunia 提交于
      Summary:
      Range Locking - an implementation based on the locktree library
      
      - Add a RangeTreeLockManager and RangeTreeLockTracker which implement
        range locking using the locktree library.
      - Point locks are handled as locks on single-point ranges.
      - Add a unit test: range_locking_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7506
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D25320703
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: f86347384b42ba2b0257d67eca0f45f806b69da7
      daab7603
  34. 10 12月, 2020 1 次提交
  35. 12 11月, 2020 1 次提交
    • M
      Create a Customizable class to load classes and configurations (#6590) · c442f680
      mrambacher 提交于
      Summary:
      The Customizable class is an extension of the Configurable class and allows instances to be created by a name/ID.  Classes that extend customizable can define their Type (e.g. "TableFactory", "Cache") and  a method to instantiate them (TableFactory::CreateFromString).  Customizable objects can be registered with the ObjectRegistry and created dynamically.
      
      Future PRs will make more types of objects extend Customizable.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6590
      
      Reviewed By: cheng-chang
      
      Differential Revision: D24841553
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: d0c2132bd932e971cbfe2c908ca2e5db30c5e155
      c442f680
  36. 26 10月, 2020 1 次提交
    • P
      Ribbon: initial (general) algorithms and basic unit test (#7491) · 25d54c79
      Peter Dillinger 提交于
      Summary:
      This is intended as the first commit toward a near-optimal alternative to static Bloom filters for SSTs. Stephan Walzer and I have agreed upon the name "Ribbon" for a PHSF based on his linear system construction in "Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications" ("SGauss") and my much faster "on the fly" algorithm for gaussian elimination (or for this linear system, "banding"), which can be faster than peeling while also more compact and flexible. See util/ribbon_alg.h for more detailed introduction and background. RIBBON = Rapid Incremental Boolean Banding ON-the-fly
      
      This commit just adds generic (templatized) core algorithms and a basic unit test showing some features, including the ability to construct structures within 2.5% space overhead vs. information theoretic lower bound. (Compare to cache-local Bloom filter's ~50% space overhead -> ~30% reduction anticipated.) This commit does not include the storage scheme necessary to make queries fast, especially for filter queries, nor fractional "result bits", but there is some description already and those implementations will come soon. Nor does this commit add FilterPolicy support, for use in SST files, but that will also come soon.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7491
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24517954
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 0119ee597e250d7e0edd38ada2ba50d755606fa7
      25d54c79
  37. 20 10月, 2020 1 次提交
    • C
      Abstract out LockManager interface (#7532) · 0ea7db76
      Cheng Chang 提交于
      Summary:
      In order to be able to introduce more locking protocols, we need to abstract out the locking subsystem in TransactionDB into a set of interfaces.
      
      PR https://github.com/facebook/rocksdb/pull/7013 introduces interface `LockTracker`. This PR is a follow up to take the first step to abstract out a `LockManager` interface.
      
      Further modifications to the interface may be needed when introducing the first implementation of range lock. But the idea here is to put the range lock implementation based on range tree under the `utilities/transactions/lock/range/range_tree`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7532
      
      Test Plan: point_lock_manager_test
      
      Reviewed By: ajkr
      
      Differential Revision: D24238731
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 2a9458cd8b3fb008d9529dbc4d3b28c24631f463
      0ea7db76
  38. 16 10月, 2020 1 次提交
    • L
      Introduce BlobFileCache and add support for blob files to Get() (#7540) · e8cb32ed
      Levi Tamasi 提交于
      Summary:
      The patch adds blob file support to the `Get` API by extending `Version` so that
      whenever a blob reference is read from a file, the blob is retrieved from the corresponding
      blob file and passed back to the caller. (This is assuming the blob reference is valid
      and the blob file is actually part of the given `Version`.) It also introduces a cache
      of `BlobFileReader`s called `BlobFileCache` that enables sharing `BlobFileReader`s
      between callers. `BlobFileCache` uses the same backing cache as `TableCache`, so
      `max_open_files` (if specified) limits the total number of open (table + blob) files.
      
      TODO: proactively open/cache blob files and pin the cache handles of the readers in the
      metadata objects similarly to what `VersionBuilder::LoadTableHandlers` does for
      table files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7540
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D24260219
      
      Pulled By: ltamasi
      
      fbshipit-source-id: a8a2a4f11d3d04d6082201b52184bc4d7b0857ba
      e8cb32ed