1. 02 3月, 2022 1 次提交
  2. 24 2月, 2022 1 次提交
    • B
      Add a secondary cache implementation based on LRUCache 1 (#9518) · f706a9c1
      Bo Wang 提交于
      Summary:
      **Summary:**
      RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO.
      
      This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518
      
      Test Plan:
      In this PR, the lru_secondary_cache_test.cc includes the following tests:
      1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests.
      2. The integration tests with both primary cache and this secondary cache .
      
      **Follow Up:**
      
      1. Statistics (e.g. compression ratio) will be added in another PR.
      2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact.
      
      Reviewed By: anand1976
      
      Differential Revision: D34430930
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
      f706a9c1
  3. 18 2月, 2022 1 次提交
  4. 05 2月, 2022 2 次提交
    • P
      Require C++17 (#9481) · fd3e0f43
      Peter Dillinger 提交于
      Summary:
      Drop support for some old compilers by requiring C++17 standard
      (or higher). See https://github.com/facebook/rocksdb/issues/9388
      
      First modification based on this is to remove some conditional compilation in slice.h (also
      better for ODR)
      
      Also in this PR:
      * Fix some Makefile formatting that seems to affect ASSERT_STATUS_CHECKED config in
      some cases
      * Add c_test to NON_PARALLEL_TEST in Makefile
      * Fix a clang-analyze reported "potential leak" in lru_cache_test
      * Better "compatibility" definition of DEFINE_uint32 for old versions of gflags
      * Fix a linking problem with shared libraries in Makefile (`./random_test: error while loading shared libraries: librocksdb.so.6.29: cannot open shared object file: No such file or directory`)
      * Always set ROCKSDB_SUPPORT_THREAD_LOCAL and use thread_local (from C++11)
        * TODO in later PR: clean up that obsolete flag
      * Fix a cosmetic typo in c.h (https://github.com/facebook/rocksdb/issues/9488)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9481
      
      Test Plan:
      CircleCI config substantially updated.
      
      * Upgrade to latest Ubuntu images for each release
      * Generally prefer Ubuntu 20, but keep a couple Ubuntu 16 builds with oldest supported
      compilers, to ensure compatibility
      * Remove .circleci/cat_ignore_eagain except for Ubuntu 16 builds, because this is to work
      around a kernel bug that should not affect anything but Ubuntu 16.
      * Remove designated gcc-9 build, because the default linux build now uses GCC 9 from
      Ubuntu 20.
      * Add some `apt-key add` to fix some apt "couldn't be verified" errors
      * Generally drop SKIP_LINK=1; work-around no longer needed
      * Generally `add-apt-repository` before `apt-get update` as manual testing indicated the
      reverse might not work.
      
      Travis:
      * Use gcc-7 by default (remove specific gcc-7 and gcc-4.8 builds)
      * TODO in later PR: fix s390x "Assembler messages: Error: invalid switch -march=z14" failure
      
      AppVeyor:
      * Completely dropped because we are dropping VS2015 support and CircleCI covers
      VS >= 2017
      
      Also local testing with old gflags (out of necessity when using ROCKSDB_NO_FBCODE=1).
      
      Reviewed By: mrambacher
      
      Differential Revision: D33946377
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ae077c823905b45370a26c0103ada119459da6c1
      fd3e0f43
    • P
      Enhance new cache key testing & comments (#9329) · afc280fd
      Peter Dillinger 提交于
      Summary:
      Follow-up to https://github.com/facebook/rocksdb/issues/9126
      
      Added new unit tests to validate some of the claims of guaranteed uniqueness
      within certain large bounds.
      
      Also cleaned up the cache_bench -stress-cache-key tool with better comments
      and description.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9329
      
      Test Plan: no changes to production code
      
      Reviewed By: mrambacher
      
      Differential Revision: D33269328
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 3a2b684a6b2b15f79dc872e563e3d16563be26de
      afc280fd
  5. 18 12月, 2021 1 次提交
  6. 17 12月, 2021 1 次提交
    • P
      New stable, fixed-length cache keys (#9126) · 0050a73a
      Peter Dillinger 提交于
      Summary:
      This change standardizes on a new 16-byte cache key format for
      block cache (incl compressed and secondary) and persistent cache (but
      not table cache and row cache).
      
      The goal is a really fast cache key with practically ideal stability and
      uniqueness properties without external dependencies (e.g. from FileSystem).
      A fixed key size of 16 bytes should enable future optimizations to the
      concurrent hash table for block cache, which is a heavy CPU user /
      bottleneck, but there appears to be measurable performance improvement
      even with no changes to LRUCache.
      
      This change replaces a lot of disjointed and ugly code handling cache
      keys with calls to a simple, clean new internal API (cache_key.h).
      (Preserving the old cache key logic under an option would be very ugly
      and likely negate the performance gain of the new approach. Complete
      replacement carries some inherent risk, but I think that's acceptable
      with sufficient analysis and testing.)
      
      The scheme for encoding new cache keys is complicated but explained
      in cache_key.cc.
      
      Also: EndianSwapValue is moved to math.h to be next to other bit
      operations. (Explains some new include "math.h".) ReverseBits operation
      added and unit tests added to hash_test for both.
      
      Fixes https://github.com/facebook/rocksdb/issues/7405 (presuming a root cause)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9126
      
      Test Plan:
      ### Basic correctness
      Several tests needed updates to work with the new functionality, mostly
      because we are no longer relying on filesystem for stable cache keys
      so table builders & readers need more context info to agree on cache
      keys. This functionality is so core, a huge number of existing tests
      exercise the cache key functionality.
      
      ### Performance
      Create db with
      `TEST_TMPDIR=/dev/shm ./db_bench -bloom_bits=10 -benchmarks=fillrandom -num=3000000 -partition_index_and_filters`
      And test performance with
      `TEST_TMPDIR=/dev/shm ./db_bench -readonly -use_existing_db -bloom_bits=10 -benchmarks=readrandom -num=3000000 -duration=30 -cache_index_and_filter_blocks -cache_size=250000 -threads=4`
      using DEBUG_LEVEL=0 and simultaneous before & after runs.
      Before ops/sec, avg over 100 runs: 121924
      After ops/sec, avg over 100 runs: 125385 (+2.8%)
      
      ### Collision probability
      I have built a tool, ./cache_bench -stress_cache_key to broadly simulate host-wide cache activity
      over many months, by making some pessimistic simplifying assumptions:
      * Every generated file has a cache entry for every byte offset in the file (contiguous range of cache keys)
      * All of every file is cached for its entire lifetime
      
      We use a simple table with skewed address assignment and replacement on address collision
      to simulate files coming & going, with quite a variance (super-Poisson) in ages. Some output
      with `./cache_bench -stress_cache_key -sck_keep_bits=40`:
      
      ```
      Total cache or DBs size: 32TiB  Writing 925.926 MiB/s or 76.2939TiB/day
      Multiply by 9.22337e+18 to correct for simulation losses (but still assume whole file cached)
      ```
      
      These come from default settings of 2.5M files per day of 32 MB each, and
      `-sck_keep_bits=40` means that to represent a single file, we are only keeping 40 bits of
      the 128-bit cache key.  With file size of 2\*\*25 contiguous keys (pessimistic), our simulation
      is about 2\*\*(128-40-25) or about 9 billion billion times more prone to collision than reality.
      
      More default assumptions, relatively pessimistic:
      * 100 DBs in same process (doesn't matter much)
      * Re-open DB in same process (new session ID related to old session ID) on average
      every 100 files generated
      * Restart process (all new session IDs unrelated to old) 24 times per day
      
      After enough data, we get a result at the end:
      
      ```
      (keep 40 bits)  17 collisions after 2 x 90 days, est 10.5882 days between (9.76592e+19 corrected)
      ```
      
      If we believe the (pessimistic) simulation and the mathematical generalization, we would need to run a billion machines all for 97 billion days to expect a cache key collision. To help verify that our generalization ("corrected") is robust, we can make our simulation more precise with `-sck_keep_bits=41` and `42`, which takes more running time to get enough data:
      
      ```
      (keep 41 bits)  16 collisions after 4 x 90 days, est 22.5 days between (1.03763e+20 corrected)
      (keep 42 bits)  19 collisions after 10 x 90 days, est 47.3684 days between (1.09224e+20 corrected)
      ```
      
      The generalized prediction still holds. With the `-sck_randomize` option, we can see that we are beating "random" cache keys (except offsets still non-randomized) by a modest amount (roughly 20x less collision prone than random), which should make us reasonably comfortable even in "degenerate" cases:
      
      ```
      197 collisions after 1 x 90 days, est 0.456853 days between (4.21372e+18 corrected)
      ```
      
      I've run other tests to validate other conditions behave as expected, never behaving "worse than random" unless we start chopping off structured data.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33171746
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f16a57e369ed37be5e7e33525ace848d0537c88f
      0050a73a
  7. 30 11月, 2021 2 次提交
  8. 19 11月, 2021 1 次提交
    • H
      Account Bloom/Ribbon filter construction memory in global memory limit (#9073) · 74544d58
      Hui Xiao 提交于
      Summary:
      Note: This PR is the 4th part of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073) and will rebase/merge only after the first three PRs (https://github.com/facebook/rocksdb/pull/9070, https://github.com/facebook/rocksdb/pull/9071, https://github.com/facebook/rocksdb/pull/9130) merge.
      
      **Context:**
      Similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`.
      
      The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](https://github.com/facebook/rocksdb/pull/9073#discussion_r741548409) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.
      
      **Summary:**
      - Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction:
         - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance),
         - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`),
         - final filter (i.e, `mutable_buf`'s size).
            - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as  explicitly delete the filter bits builder when done with the final filter in block based table.
      - Added option fo run `filter_bench` with this memory reservation feature
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9073
      
      Test Plan:
      - Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of  `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory`
        - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally
      - Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter
      - Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below:
         - FastLocalBloom
            - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`:
               - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0)
            - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`:
               - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0)
            - new feature of RibbonFilter with fallback  (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
               - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0)
      
          - Ribbon128
             - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`:
                 - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0)
             - new feature  (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `:
                 - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0)
             - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
                - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0)
                - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console.
      
      Reviewed By: pdillinger
      
      Differential Revision: D31991348
      
      Pulled By: hx235
      
      fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
      74544d58
  9. 17 11月, 2021 1 次提交
    • P
      Check for and disallow shared key space in block caches (#9172) · f8c685c4
      Peter Dillinger 提交于
      Summary:
      We have three layers of block cache that often use the same key
      but map to different physical data:
      * BlockBasedTableOptions::block_cache
      * BlockBasedTableOptions::block_cache_compressed
      * BlockBasedTableOptions::persistent_cache
      
      If any two of these happen to share an underlying implementation and key
      space (insertion into one shows up in another), then memory safety is
      broken. The simplest case is block_cache == block_cache_compressed.
      (Credit mrambacher for asking about this case in a review.)
      
      With this change, we explicitly check for overlap and preemptively and
      safely fail with a Status code.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9172
      
      Test Plan: test added. Crashes without new check
      
      Reviewed By: anand1976
      
      Differential Revision: D32465659
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 3876b45b6dce6167e5a7a642725ddc86b96f8e40
      f8c685c4
  10. 10 11月, 2021 2 次提交
  11. 06 11月, 2021 1 次提交
  12. 02 11月, 2021 1 次提交
    • H
      Add new API CacheReservationManager::GetDummyEntrySize() (#9072) · 560fe702
      Hui Xiao 提交于
      Summary:
      Note: it might conflict with another CRM related PR https://github.com/facebook/rocksdb/pull/9071 and so will merge after that's merged.
      
      Context:
      As `CacheReservationManager` being used by more memory users, it is convenient to retrieve the dummy entry size for `CacheReservationManager` instead of hard-coding `256 * 1024` in writing tests. Plus it allows more flexibility to change our implementation on dummy entry size.
      
      A follow-up PR is needed to replace those hard-coded dummy entry size value in `db_test2.cc`, `db_write_buffer_manager_test.cc`, `write_buffer_manager_test.cc`, `table_test.cc` and the ones introduced in https://github.com/facebook/rocksdb/pull/9072#issue-1034326069.
      - Exposed the private static constexpr `kDummyEntrySize` through public static `CacheReservationManager::GetDummyEntrySize()`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9072
      
      Test Plan:
      - Passing new tests
      - Passing existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D32043684
      
      Pulled By: hx235
      
      fbshipit-source-id: ddefc6921c052adab6a2cda2394eb26da3076a50
      560fe702
  13. 20 10月, 2021 1 次提交
    • Z
      Add lowest_used_cache_tier to ImmutableDBOptions to enable or disable Secondary Cache (#9050) · 6d93b875
      Zhichao Cao 提交于
      Summary:
      Currently, if Secondary Cache is provided to the lru cache, it is used by default. We add CacheTier to advanced_options.h to describe the cache tier we used. Add a `lowest_used_cache_tier` option to `DBOptions` (immutable) and pass it to BlockBasedTableReader to decide if secondary cache will be used or not. By default it is `CacheTier::kNonVolatileTier`, which means, we always use both block cache (kVolatileTier) and secondary cache (kNonVolatileTier). By set it to `CacheTier::kVolatileTier`, the DB will not use the secondary cache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9050
      
      Test Plan: added new tests
      
      Reviewed By: anand1976
      
      Differential Revision: D31744769
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: a0575ebd23e1c6dfcfc2b4c8578764e73b15bce6
      6d93b875
  14. 08 10月, 2021 1 次提交
    • Z
      Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache (#8912) · 699f4504
      Zhichao Cao 提交于
      Summary:
      Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.
      
      Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8912
      
      Test Plan: add new tests to lru_cache_test and pass make check.
      
      Reviewed By: pdillinger
      
      Differential Revision: D31452871
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48
      699f4504
  15. 11 9月, 2021 1 次提交
    • P
      Fix and detect headers with missing dependencies (#8893) · bda8d93b
      Peter Dillinger 提交于
      Summary:
      It's always annoying to find a header does not include its own
      dependencies and only works when included after other includes. This
      change adds `make check-headers` which validates that each header can
      be included at the top of a file. Some headers are excluded e.g. because
      of platform or external dependencies.
      
      rocksdb_namespace.h had to be re-worked slightly to enable checking for
      failure to include it. (ROCKSDB_NAMESPACE is a valid namespace name.)
      
      Fixes mostly involve adding and cleaning up #includes, but for
      FileTraceWriter, a constructor was out-of-lined to make a forward
      declaration sufficient.
      
      This check is not currently run with `make check` but is added to
      CircleCI build-linux-unity since that one is already relatively fast.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8893
      
      Test Plan: existing tests and resolving issues detected by new check
      
      Reviewed By: mrambacher
      
      Differential Revision: D30823300
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 9fff223944994c83c105e2e6496d24845dc8e572
      bda8d93b
  16. 10 9月, 2021 1 次提交
    • H
      Add comment for new_memory_used parameter in... · 0aad4ca0
      Hui Xiao 提交于
      Add comment for new_memory_used parameter in CacheReservationManager::UpdateCacheReservation (#8895)
      
      Summary:
      Context/Summary: this PR is to clarify what the parameter new_memory_used is in CacheReservationManager::UpdateCacheReservation
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8895
      
      Test Plan:
      - Passing existing test
      - Make format
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D30844814
      
      Pulled By: hx235
      
      fbshipit-source-id: 3177f7abf5668ea9e73818ceaa355566f03acabc
      0aad4ca0
  17. 09 9月, 2021 1 次提交
    • H
      Account for dictionary-building buffer in global memory limit (#8428) · 91b95cad
      Hui Xiao 提交于
      Summary:
      Context:
      Some data blocks are temporarily buffered in memory in BlockBasedTableBuilder for building compression dictionary used in data block compression. Currently this memory usage is not counted toward our global memory usage utilizing block cache capacity. To improve that, this PR charges that memory usage into the block cache to achieve better memory tracking and limiting.
      
      - Reserve memory in block cache for buffered data blocks that are used to build a compression dictionary
      - Release all the memory associated with buffering the data blocks mentioned above in EnterUnbuffered(), which is called when (a) buffer limit is exceeded after buffering OR (b) the block cache becomes full after reservation OR (c) BlockBasedTableBuilder calls Finish()
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8428
      
      Test Plan:
      - Passing existing unit tests
      - Passing new unit tests
      
      Reviewed By: ajkr
      
      Differential Revision: D30755305
      
      Pulled By: hx235
      
      fbshipit-source-id: 6e66665020b775154a94c4c5e0f2adaeaff13981
      91b95cad
  18. 08 9月, 2021 2 次提交
    • P
      Add (& fix) some simple source code checks (#8821) · cb5b851f
      Peter Dillinger 提交于
      Summary:
      * Don't hardcode namespace rocksdb (use ROCKSDB_NAMESPACE)
      * Don't #include <rocksdb/...> (use double quotes)
      * Support putting NOCOMMIT (any case) in source code that should not be
      committed/pushed in current state.
      
      These will be run with `make check` and in GitHub actions
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8821
      
      Test Plan: existing tests, manually try out new checks
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30791726
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 399c883f312be24d9e55c58951d4013e18429d92
      cb5b851f
    • P
      Replace most typedef with using= (#8751) · 4750421e
      Peter Dillinger 提交于
      Summary:
      Old typedef syntax is confusing
      
      Most but not all changes with
      
          perl -pi -e 's/typedef (.*) ([a-zA-Z0-9_]+);/using $2 = $1;/g' list_of_files
          make format
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8751
      
      Test Plan: existing
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30745277
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6f65f0631c3563382d43347896020413cc2366d9
      4750421e
  19. 31 8月, 2021 1 次提交
  20. 25 8月, 2021 1 次提交
    • H
      Refactor WriteBufferManager::CacheRep into CacheReservationManager (#8506) · 74cfe7db
      Hui Xiao 提交于
      Summary:
      Context:
      To help cap various memory usage by a single limit of the block cache capacity, we charge the memory usage through inserting/releasing dummy entries in the block cache. CacheReservationManager is such a class (non thread-safe) responsible for  inserting/removing dummy entries to reserve cache space for memory used by the class user.
      
      - Refactored the inner private class CacheRep of WriteBufferManager into public CacheReservationManager class for reusability such as for https://github.com/facebook/rocksdb/pull/8428
      
      - Encapsulated implementation details of cache key generation and dummy entries insertion/release in cache reservation as discussed in https://github.com/facebook/rocksdb/pull/8506#discussion_r666550838
      
      - Consolidated increase/decrease cache reservation into one API - UpdateCacheReservation.
      
      - Adjusted the previous dummy entry release algorithm in decreasing cache reservation to be loop-releasing dummy entries to stay symmetric to dummy entry insertion algorithm
      
      - Made the previous dummy entry release algorithm in delayed decrease mode more aggressive for better decreasing cache reservation when memory used is less likely to increase back.
      
        Previously, the algorithms only release 1 dummy entries when new_mem_used < 3/4 * cache_allocated_size_ and cache_allocated_size_ - kSizeDummyEntry > new_mem_used.
      Now, the algorithms loop-releases as many dummy entries as possible when new_mem_used < 3/4 * cache_allocated_size_.
      
      - Updated WriteBufferManager's test cases to adapt to changes on the release algorithm mentioned above and left comment for some test cases for clarity
      
      - Replaced the previous cache key prefix generation (utilizing object address related to the cache client) with one that utilizes Cache->NewID() to prevent cache-key collision among dummy entry clients sharing the same cache.
      
        The specific collision we are preventing happens when the object address is reused for a new cache-key prefix while the old cache-key using that same object address in its prefix still exists in the cache. This could happen due to that, under LRU cache policy, there is a possible delay in releasing a cache entry after the cache client object owning that cache entry get deallocated. In this case, the object address related to the cache client object can get reused for other client object to generate a new cache-key prefix.
      
        This prefix generation can be made obsolete after Peter's unification of all the code generating cache key, mentioned in https://github.com/facebook/rocksdb/pull/8506#discussion_r667265255
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8506
      
      Test Plan:
      - Passing the added unit tests cache_reservation_manager_test.cc
      - Passing existing and adjusted write_buffer_manager_test.cc
      
      Reviewed By: ajkr
      
      Differential Revision: D29644135
      
      Pulled By: hx235
      
      fbshipit-source-id: 0fc93fbfe4a40bb41be85c314f8f2bafa8b741f7
      74cfe7db
  21. 21 8月, 2021 1 次提交
  22. 17 8月, 2021 1 次提交
    • A
      Add a stat to count secondary cache hits (#8666) · add68bd2
      anand76 提交于
      Summary:
      Add a stat for secondary cache hits. The ```Cache::Lookup``` API had an unused ```stats``` parameter. This PR uses that to pass the pointer to a ```Statistics``` object that ```LRUCache``` uses to record the stat.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8666
      
      Test Plan: Update a unit test in lru_cache_test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30353816
      
      Pulled By: anand1976
      
      fbshipit-source-id: 2046f78b460428877a26ffdd2bb914ae47dfbe77
      add68bd2
  23. 23 7月, 2021 1 次提交
  24. 17 7月, 2021 1 次提交
    • P
      Don't hold DB mutex for block cache entry stat scans (#8538) · df5dc73b
      Peter Dillinger 提交于
      Summary:
      I previously didn't notice the DB mutex was being held during
      block cache entry stat scans, probably because I primarily checked for
      read performance regressions, because they require the block cache and
      are traditionally latency-sensitive.
      
      This change does some refactoring to avoid holding DB mutex and to
      avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats").
      Some tests have to be updated because now the stats collector is
      populated in the Cache aggressively on DB startup rather than lazily.
      (I hope to clean up some of this added complexity in the future.)
      
      This change also ensures proper treatment of need_out_of_mutex for
      non-int DB properties.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538
      
      Test Plan:
      Added unit test logic that uses sync points to fail if the DB mutex
      is held during a scan, covering the various ways that a scan might be
      triggered.
      
      Performance test - the known impact to holding the DB mutex is on
      TransactionDB, and the easiest way to see the impact is to hack the
      scan code to almost always miss and take an artificially long time
      scanning. Here I've injected an unconditional 5s sleep at the call to
      ApplyToAllEntries.
      
      Before (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     433.219 micros/op 2308 ops/sec;    0.1 MB/s ( transactions:78999 aborts:0)
          rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856
          $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     448.802 micros/op 2228 ops/sec;    0.1 MB/s ( transactions:75999 aborts:0)
          rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323
      
      Notice the 5s P100 write time.
      
      After (hacked):
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     303.645 micros/op 3293 ops/sec;    0.1 MB/s ( transactions:98999 aborts:0)
          rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     310.383 micros/op 3221 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918
      
      P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code:
      
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     311.365 micros/op 3211 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
          rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767
          $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
          randomtransaction :     308.395 micros/op 3242 ops/sec;    0.1 MB/s ( transactions:97999 aborts:0)
          rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832
      
      No substantial difference.
      
      Reviewed By: siying
      
      Differential Revision: D29738847
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b
      df5dc73b
  25. 16 7月, 2021 1 次提交
    • P
      Work around falsely reported data race on LRUHandle::flags (#8539) · 5ad32276
      Peter Dillinger 提交于
      Summary:
      Some bits are mutated and read while holding a lock, other
      immutable bits (esp. secondary cache compatibility) can be read by
      arbitrary threads without holding a lock. AFAIK, this doesn't cause an
      issue on any architecture we care about, because you will get some
      legitimate version of the value that includes the initialization, as
      long as synchronization guarantees the initialization happens before the
      read.
      
      I've only seen this in https://github.com/facebook/rocksdb/issues/8538 so far, but it should be fixed regardless.
      Otherwise, we'll surely get these false reports again some time.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8539
      
      Test Plan: some local TSAN test runs and in CircleCI
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29720262
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 365fd7e565577c648815161f71b339bcb5ce12d5
      5ad32276
  26. 07 7月, 2021 1 次提交
  27. 01 7月, 2021 1 次提交
    • A
      Fix assertion failure when releasing a handle after secondary cache lookup fails (#8470) · a0cbb694
      anand76 提交于
      Summary:
      When the secondary cache lookup fails, we may still allocate a handle and charge the cache for metadata usage. If the cache is full, this can cause the usage to go over capacity. Later, when a (unrelated) handle is released, it trips up an assertion that checks that usage is less than capacity. To prevent this assertion failure, don't charge the cache for a failed secondary cache lookup.
      
      Tests:
      Run crash_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8470
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29474713
      
      Pulled By: anand1976
      
      fbshipit-source-id: 27191969c95470a7b070d292b458efce71395bf2
      a0cbb694
  28. 22 6月, 2021 1 次提交
    • A
      Fix a tsan warning due to reading flags in LRUHandle without holding a mutex (#8433) · a50da404
      anand76 提交于
      Summary:
      Tsan complains due to a perceived race condition in accessing LRUHandle flags. One thread calls ```LRUHandle::SetHit()``` from ```LRUCacheShard::Lookup()```, while another thread calls ```LRUHandle::IsPending()``` from ```LRUCacheShard::IsReady()```. The latter call is from ```MultiGet```. It doesn't actually have to call ```IsReady``` since a null value indicates the cache handle is not ready, so its sufficient to check for a null value.
      
      Also modify ```IsReady``` to acquire the LRU shard mutex.
      
      Tests:
      1. make check
      2. Run tsan_crash
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8433
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29278030
      
      Pulled By: anand1976
      
      fbshipit-source-id: 0c9fed56d12eda853e72dadebe75038361bd257f
      a50da404
  29. 19 6月, 2021 1 次提交
    • A
      Parallelize secondary cache lookup in MultiGet (#8405) · 8ea0a2c1
      anand76 提交于
      Summary:
      Implement the ```WaitAll()``` interface in ```LRUCache``` to allow callers to issue multiple lookups in parallel and wait for all of them to complete. Modify ```MultiGet``` to use this to parallelize the secondary cache lookups in order to reduce the overall latency. A call to ```cache->Lookup()``` returns a handle that has an incomplete value (nullptr), and the caller can call ```cache->IsReady()``` to check whether the lookup is complete, and pass a vector of handles to ```WaitAll``` to wait for completion. If any of the lookups fail, ```MultiGet``` will read the block from the SST file.
      
      Another change in this PR is to rename ```SecondaryCacheHandle``` to ```SecondaryCacheResultHandle``` as it more accurately describes the return result of the secondary cache lookup, which is more like a future.
      
      Tests:
      1. Add unit tests in lru_cache_test
      2. Benchmark results with no secondary cache configured
      Master -
      ```
      readrandom   :      41.175 micros/op 388562 ops/sec;  106.7 MB/s (7277999 of 7277999 found)
      readrandom   :      41.217 micros/op 388160 ops/sec;  106.6 MB/s (7274999 of 7274999 found)
      multireadrandom :      10.309 micros/op 1552082 ops/sec; (28908992 of 28908992 found)
      multireadrandom :      10.321 micros/op 1550218 ops/sec; (29081984 of 29081984 found)
      ```
      
      This PR -
      ```
      readrandom   :      41.158 micros/op 388723 ops/sec;  106.8 MB/s (7290999 of 7290999 found)
      readrandom   :      41.185 micros/op 388463 ops/sec;  106.7 MB/s (7287999 of 7287999 found)
      multireadrandom :      10.277 micros/op 1556801 ops/sec; (29346944 of 29346944 found)
      multireadrandom :      10.253 micros/op 1560539 ops/sec; (29274944 of 29274944 found)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8405
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29190509
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6f8eff6246712af8a297cfe22ea0d1c3b2a01bb0
      8ea0a2c1
  30. 15 6月, 2021 1 次提交
  31. 14 6月, 2021 1 次提交
    • P
      Pin CacheEntryStatsCollector to fix performance bug (#8385) · d5a46c40
      Peter Dillinger 提交于
      Summary:
      If the block Cache is full with strict_capacity_limit=false,
      then our CacheEntryStatsCollector could be immediately evicted on
      release, so iterating through column families with shared block cache
      could trigger re-scan for each CF. This change fixes that problem by
      pinning the CacheEntryStatsCollector from InternalStats so that it's not
      evicted.
      
      I had originally thought that this object could participate in LRU like
      everything else, but even though a re-load+re-scan only touches memory,
      it can be orders of magnitude more expensive than other cache misses.
      One service in Facebook has scans that take ~20s over 100GB block cache
      that is mostly 4KB entries. (The up-side of this bug and https://github.com/facebook/rocksdb/issues/8369 is that
      we had a natural experiment on the effect on some service metrics even
      with block cache scans running continuously in the background--a kind
      of worst case scenario. Metrics like latency were not affected enough
      to trigger warnings.)
      
      Other smaller fixes:
      
      20s is already a sizable portion of 600s stats dump period, or 180s
      default max age to force re-scan, so added logic to ensure that (for
      each block cache) we don't spend more than 0.2% of our background thread
      time scanning it. Nevertheless, "foreground" requests for cache entry
      stats (calls to `db->GetMapProperty(DB::Properties::kBlockCacheEntryStats)`)
      are permitted to consume more CPU.
      
      Renamed field to cache_entry_stats_ to match code style.
      
      This change is intended for patching in 6.21 release.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8385
      
      Test Plan:
      unit test expanded to cover new logic (detect regression),
      some manual testing with db_bench
      
      Reviewed By: ajkr
      
      Differential Revision: D29042759
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 236faa902397f50038c618f50fbc8cf3f277308c
      d5a46c40
  32. 11 6月, 2021 1 次提交
    • Z
      Use DbSessionId as cache key prefix when secondary cache is enabled (#8360) · f44e69c6
      Zhichao Cao 提交于
      Summary:
      Currently, we either use the file system inode or a monotonically incrementing runtime ID as the block cache key prefix. However, if we use a monotonically incrementing runtime ID (in the case that the file system does not support inode id generation), in some cases, it cannot ensure uniqueness (e.g., we have secondary cache migrated from host to host). We use DbSessionID (20 bytes) + current file number (at most 10 bytes) as the new cache block key prefix when the secondary cache is enabled. So can accommodate scenarios such as transfer of cache state across hosts.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8360
      
      Test Plan: add the test to lru_cache_test
      
      Reviewed By: pdillinger
      
      Differential Revision: D29006215
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 6cff686b38d83904667a2bd39923cd030df16814
      f44e69c6
  33. 08 6月, 2021 1 次提交
    • P
      Fix a major performance bug in 6.21 for cache entry stats (#8369) · 2f93a3b8
      Peter Dillinger 提交于
      Summary:
      In final polishing of https://github.com/facebook/rocksdb/issues/8297 (after most manual testing), I
      broke my own caching layer by sanitizing an input parameter with
      std::min(0, x) instead of std::max(0, x). I resisted unit testing the
      timing part of the result caching because historically, these test
      are either flaky or difficult to write, and this was not a correctness
      issue. This bug is essentially unnoticeable with a small number
      of column families but can explode background work with a
      large number of column families.
      
      This change fixes the logical error, removes some unnecessary related
      optimization, and adds mock time/sleeps to the unit test to ensure we
      can cache hit within the age limit.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8369
      
      Test Plan: added time testing logic to existing unit test
      
      Reviewed By: ajkr
      
      Differential Revision: D28950892
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e79cd4ff3eec68fd0119d994f1ed468c38026c3b
      2f93a3b8
  34. 24 5月, 2021 1 次提交
  35. 22 5月, 2021 1 次提交
    • Z
      Use new Insert and Lookup APIs in table reader to support secondary cache (#8315) · 7303d02b
      Zhichao Cao 提交于
      Summary:
      Secondary cache is implemented to achieve the secondary cache tier for block cache. New Insert and Lookup APIs are introduced in https://github.com/facebook/rocksdb/issues/8271  . To support and use the secondary cache in block based table reader, this PR introduces the corresponding callback functions that will be used in secondary cache, and update the Insert and Lookup APIs accordingly.
      
      benchmarking:
      ./db_bench --benchmarks="fillrandom" -num=1000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/tmp/rocks_t/db -partition_index_and_filters=true
      
      ./db_bench -db=/tmp/rocks_t/db -use_existing_db=true -benchmarks=readrandom -num=1000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=5 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -stats_dump_period_sec=30 -reads=50000000
      
      master benchmarking results:
      readrandom   :       3.923 micros/op 254881 ops/sec;   33.4 MB/s (23849796 of 50000000 found)
      rocksdb.db.get.micros P50 : 2.820992 P95 : 5.636716 P99 : 16.450553 P100 : 8396.000000 COUNT : 50000000 SUM : 179947064
      
      Current PR benchmarking results
      readrandom   :       4.083 micros/op 244925 ops/sec;   32.1 MB/s (23849796 of 50000000 found)
      rocksdb.db.get.micros P50 : 2.967687 P95 : 5.754916 P99 : 15.665912 P100 : 8213.000000 COUNT : 50000000 SUM : 187250053
      
      About 3.8% throughput reduction.
      P50: 5.2% increasing, P95, 2.09% increasing, P99 4.77% improvement
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8315
      
      Test Plan: added the testing case
      
      Reviewed By: anand1976
      
      Differential Revision: D28599774
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 098c4df0d7327d3a546df7604b2f1602f13044ed
      7303d02b
  36. 20 5月, 2021 1 次提交
    • P
      Use deleters to label cache entries and collect stats (#8297) · 311a544c
      Peter Dillinger 提交于
      Summary:
      This change gathers and publishes statistics about the
      kinds of items in block cache. This is especially important for
      profiling relative usage of cache by index vs. filter vs. data blocks.
      It works by iterating over the cache during periodic stats dump
      (InternalStats, stats_dump_period_sec) or on demand when
      DB::Get(Map)Property(kBlockCacheEntryStats), except that for
      efficiency and sharing among column families, saved data from
      the last scan is used when the data is not considered too old.
      
      The new information can be seen in info LOG, for example:
      
          Block cache LRUCache@0x7fca62229330 capacity: 95.37 MB collections: 8 last_copies: 0 last_secs: 0.00178 secs_since: 0
          Block cache entry stats(count,size,portion): DataBlock(7092,28.24 MB,29.6136%) FilterBlock(215,867.90 KB,0.888728%) FilterMetaBlock(2,5.31 KB,0.00544%) IndexBlock(217,180.11 KB,0.184432%) WriteBuffer(1,256.00 KB,0.262144%) Misc(1,0.00 KB,0%)
      
      And also through DB::GetProperty and GetMapProperty (here using
      ldb just for demonstration):
      
          $ ./ldb --db=/dev/shm/dbbench/ get_property rocksdb.block-cache-entry-stats
          rocksdb.block-cache-entry-stats.bytes.data-block: 0
          rocksdb.block-cache-entry-stats.bytes.deprecated-filter-block: 0
          rocksdb.block-cache-entry-stats.bytes.filter-block: 0
          rocksdb.block-cache-entry-stats.bytes.filter-meta-block: 0
          rocksdb.block-cache-entry-stats.bytes.index-block: 178992
          rocksdb.block-cache-entry-stats.bytes.misc: 0
          rocksdb.block-cache-entry-stats.bytes.other-block: 0
          rocksdb.block-cache-entry-stats.bytes.write-buffer: 0
          rocksdb.block-cache-entry-stats.capacity: 8388608
          rocksdb.block-cache-entry-stats.count.data-block: 0
          rocksdb.block-cache-entry-stats.count.deprecated-filter-block: 0
          rocksdb.block-cache-entry-stats.count.filter-block: 0
          rocksdb.block-cache-entry-stats.count.filter-meta-block: 0
          rocksdb.block-cache-entry-stats.count.index-block: 215
          rocksdb.block-cache-entry-stats.count.misc: 1
          rocksdb.block-cache-entry-stats.count.other-block: 0
          rocksdb.block-cache-entry-stats.count.write-buffer: 0
          rocksdb.block-cache-entry-stats.id: LRUCache@0x7f3636661290
          rocksdb.block-cache-entry-stats.percent.data-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.deprecated-filter-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.filter-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.filter-meta-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.index-block: 2.133751
          rocksdb.block-cache-entry-stats.percent.misc: 0.000000
          rocksdb.block-cache-entry-stats.percent.other-block: 0.000000
          rocksdb.block-cache-entry-stats.percent.write-buffer: 0.000000
          rocksdb.block-cache-entry-stats.secs_for_last_collection: 0.000052
          rocksdb.block-cache-entry-stats.secs_since_last_collection: 0
      
      Solution detail - We need some way to flag what kind of blocks each
      entry belongs to, preferably without changing the Cache API.
      One of the complications is that Cache is a general interface that could
      have other users that don't adhere to whichever convention we decide
      on for keys and values. Or we would pay for an extra field in the Handle
      that would only be used for this purpose.
      
      This change uses a back-door approach, the deleter, to indicate the
      "role" of a Cache entry (in addition to the value type, implicitly).
      This has the added benefit of ensuring proper code origin whenever we
      recognize a particular role for a cache entry; if the entry came from
      some other part of the code, it will use an unrecognized deleter, which
      we simply attribute to the "Misc" role.
      
      An internal API makes for simple instantiation and automatic
      registration of Cache deleters for a given value type and "role".
      
      Another internal API, CacheEntryStatsCollector, solves the problem of
      caching the results of a scan and sharing them, to ensure scans are
      neither excessive nor redundant so as not to harm Cache performance.
      
      Because code is added to BlocklikeTraits, it is pulled out of
      block_based_table_reader.cc into its own file.
      
      This is a reformulation of https://github.com/facebook/rocksdb/issues/8276, without the type checking option
      (could still be added), and with actual stat gathering.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8297
      
      Test Plan: manual testing with db_bench, and a couple of basic unit tests
      
      Reviewed By: ltamasi
      
      Differential Revision: D28488721
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 472f524a9691b5afb107934be2d41d84f2b129fb
      311a544c