1. 10 2月, 2023 1 次提交
    • P
      Put Cache and CacheWrapper in new public header (#11192) · 3cacd4b4
      Peter Dillinger 提交于
      Summary:
      The definition of the Cache class should not be needed by the vast majority of RocksDB users, so I think it is just distracting to include it in cache.h, which is primarily needed for configuring and creating caches. This change moves the class to a new header advanced_cache.h. It is just cut-and-paste except for modifying the class API comment.
      
      In general, operations on shared_ptr<Cache> should continue to work when only a forward declaration of Cache is available, as long as all the Cache instances provided are already shared_ptr. See https://stackoverflow.com/a/17650101/454544
      
      Also, the most common way to customize a Cache is by wrapping an existing implementation, so it makes sense to provide CacheWrapper in the public API. This was a cut-and-paste job except removing the implementation of Name() so that derived classes must provide it.
      
      Intended follow-up: consolidate Release() into one function to reduce customization bugs / confusion
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11192
      
      Test Plan: `make check`
      
      Reviewed By: anand1976
      
      Differential Revision: D43055487
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7b05492df35e0f30b581b4c24c579bc275b6d110
      3cacd4b4
  2. 08 2月, 2023 1 次提交
  3. 28 1月, 2023 1 次提交
    • S
      Remove RocksDB LITE (#11147) · 4720ba43
      sdong 提交于
      Summary:
      We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support.
      
      Most of changes were done through following comments:
      
      unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE | egrep '[.](cc|h)'`
      
      by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147
      
      Test Plan: See CI
      
      Reviewed By: pdillinger
      
      Differential Revision: D42796341
      
      fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2
      4720ba43
  4. 26 1月, 2023 1 次提交
  5. 25 1月, 2023 1 次提交
  6. 12 1月, 2023 1 次提交
    • P
      Major Cache refactoring, CPU efficiency improvement (#10975) · 9f7801c5
      Peter Dillinger 提交于
      Summary:
      This is several refactorings bundled into one to avoid having to incrementally re-modify uses of Cache several times. Overall, there are breaking changes to Cache class, and it becomes more of low-level interface for implementing caches, especially block cache. New internal APIs make using Cache cleaner than before, and more insulated from block cache evolution. Hopefully, this is the last really big block cache refactoring, because of rather effectively decoupling the implementations from the uses. This change also removes the EXPERIMENTAL designation on the SecondaryCache support in Cache. It seems reasonably mature at this point but still subject to change/evolution (as I warn in the API docs for Cache).
      
      The high-level motivation for this refactoring is to minimize code duplication / compounding complexity in adding SecondaryCache support to HyperClockCache (in a later PR). Other benefits listed below.
      
      * static_cast lines of code +29 -35 (net removed 6)
      * reinterpret_cast lines of code +6 -32 (net removed 26)
      
      ## cache.h and secondary_cache.h
      * Always use CacheItemHelper with entries instead of just a Deleter. There are several motivations / justifications:
        * Simpler for implementations to deal with just one Insert and one Lookup.
        * Simpler and more efficient implementation because we don't have to track which entries are using helpers and which are using deleters
        * Gets rid of hack to classify cache entries by their deleter. Instead, the CacheItemHelper includes a CacheEntryRole. This simplifies a lot of code (cache_entry_roles.h almost eliminated). Fixes https://github.com/facebook/rocksdb/issues/9428.
        * Makes it trivial to adjust SecondaryCache behavior based on kind of block (e.g. don't re-compress filter blocks).
        * It is arguably less convenient for many direct users of Cache, but direct users of Cache are now rare with introduction of typed_cache.h (below).
        * I considered and rejected an alternative approach in which we reduce customizability by assuming each secondary cache compatible value starts with a Slice referencing the uncompressed block contents (already true or mostly true), but we apparently intend to stack secondary caches. Saving an entry from a compressed secondary to a lower tier requires custom handling offered by SaveToCallback, etc.
      * Make CreateCallback part of the helper and introduce CreateContext to work with it (alternative to https://github.com/facebook/rocksdb/issues/10562). This cleans up the interface while still allowing context to be provided for loading/parsing values into primary cache. This model works for async lookup in BlockBasedTable reader (reader owns a CreateContext) under the assumption that it always waits on secondary cache operations to finish. (Otherwise, the CreateContext could be destroyed while async operation depending on it continues.) This likely contributes most to the observed performance improvement because it saves an std::function backed by a heap allocation.
      * Use char* for serialized data, e.g. in SaveToCallback, where void* was confusingly used. (We use `char*` for serialized byte data all over RocksDB, with many advantages over `void*`. `memcpy` etc. are legacy APIs that should not be mimicked.)
      * Add a type alias Cache::ObjectPtr = void*, so that we can better indicate the intent of the void* when it is to be the object associated with a Cache entry. Related: started (but did not complete) a refactoring to move away from "value" of a cache entry toward "object" or "obj". (It is confusing to call Cache a key-value store (like DB) when it is really storing arbitrary in-memory objects, not byte strings.)
      * Remove unnecessary key param from DeleterFn. This is good for efficiency in HyperClockCache, which does not directly store the cache key in memory. (Alternative to https://github.com/facebook/rocksdb/issues/10774)
      * Add allocator to Cache DeleterFn. This is a kind of future-proofing change in case we get more serious about using the Cache allocator for memory tracked by the Cache. Right now, only the uncompressed block contents are allocated using the allocator, and a pointer to that allocator is saved as part of the cached object so that the deleter can use it. (See CacheAllocationPtr.) If in the future we are able to "flatten out" our Cache objects some more, it would be good not to have to track the allocator as part of each object.
      * Removes legacy `ApplyToAllCacheEntries` and changes `ApplyToAllEntries` signature for Deleter->CacheItemHelper change.
      
      ## typed_cache.h
      Adds various "typed" interfaces to the Cache as internal APIs, so that most uses of Cache can use simple type safe code without casting and without explicit deleters, etc. Almost all of the non-test, non-glue code uses of Cache have been migrated. (Follow-up work: CompressedSecondaryCache deserves deeper attention to migrate.) This change expands RocksDB's internal usage of metaprogramming and SFINAE (https://en.cppreference.com/w/cpp/language/sfinae).
      
      The existing usages of Cache are divided up at a high level into these new interfaces. See updated existing uses of Cache for examples of how these are used.
      * PlaceholderCacheInterface - Used for making cache reservations, with entries that have a charge but no value.
      * BasicTypedCacheInterface<TValue> - Used for primary cache storage of objects of type TValue, which can be cleaned up with std::default_delete<TValue>. The role is provided by TValue::kCacheEntryRole or given in an optional template parameter.
      * FullTypedCacheInterface<TValue, TCreateContext> - Used for secondary cache compatible storage of objects of type TValue. In addition to BasicTypedCacheInterface constraints, we require TValue::ContentSlice() to return persistable data. This simplifies usage for the normal case of simple secondary cache compatibility (can give you a Slice to the data already in memory). In addition to TCreateContext performing the role of Cache::CreateContext, it is also expected to provide a factory function for creating TValue.
      * For each of these, there's a "Shared" version (e.g. FullTypedSharedCacheInterface) that holds a shared_ptr to the Cache, rather than assuming external ownership by holding only a raw `Cache*`.
      
      These interfaces introduce specific handle types for each interface instantiation, so that it's easy to see what kind of object is controlled by a handle. (Ultimately, this might not be worth the extra complexity, but it seems OK so far.)
      
      Note: I attempted to make the cache 'charge' automatically inferred from the cache object type, such as by expecting an ApproximateMemoryUsage() function, but this is not so clean because there are cases where we need to compute the charge ahead of time and don't want to re-compute it.
      
      ## block_cache.h
      This header is essentially the replacement for the old block_like_traits.h. It includes various things to support block cache access with typed_cache.h for block-based table.
      
      ## block_based_table_reader.cc
      Before this change, accessing the block cache here was an awkward mix of static polymorphism (template TBlocklike) and switch-case on a dynamic BlockType value. This change mostly unifies on static polymorphism, relying on minor hacks in block_cache.h to distinguish variants of Block. We still check BlockType in some places (especially for stats, which could be improved in follow-up work) but at least the BlockType is a static constant from the template parameter. (No more awkward partial redundancy between static and dynamic info.) This likely contributes to the overall performance improvement, but hasn't been tested in isolation.
      
      The other key source of simplification here is a more unified system of creating block cache objects: for directly populating from primary cache and for promotion from secondary cache. Both use BlockCreateContext, for context and for factory functions.
      
      ## block_based_table_builder.cc, cache_dump_load_impl.cc
      Before this change, warming caches was super ugly code. Both of these source files had switch statements to basically transition from the dynamic BlockType world to the static TBlocklike world. None of that mess is needed anymore as there's a new, untyped WarmInCache function that handles all the details just as promotion from SecondaryCache would. (Fixes `TODO akanksha: Dedup below code` in block_based_table_builder.cc.)
      
      ## Everything else
      Mostly just updating Cache users to use new typed APIs when reasonably possible, or changed Cache APIs when not.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10975
      
      Test Plan:
      tests updated
      
      Performance test setup similar to https://github.com/facebook/rocksdb/issues/10626 (by cache size, LRUCache when not "hyper" for HyperClockCache):
      
      34MB 1thread base.hyper -> kops/s: 0.745 io_bytes/op: 2.52504e+06 miss_ratio: 0.140906 max_rss_mb: 76.4844
      34MB 1thread new.hyper -> kops/s: 0.751 io_bytes/op: 2.5123e+06 miss_ratio: 0.140161 max_rss_mb: 79.3594
      34MB 1thread base -> kops/s: 0.254 io_bytes/op: 1.36073e+07 miss_ratio: 0.918818 max_rss_mb: 45.9297
      34MB 1thread new -> kops/s: 0.252 io_bytes/op: 1.36157e+07 miss_ratio: 0.918999 max_rss_mb: 44.1523
      34MB 32thread base.hyper -> kops/s: 7.272 io_bytes/op: 2.88323e+06 miss_ratio: 0.162532 max_rss_mb: 516.602
      34MB 32thread new.hyper -> kops/s: 7.214 io_bytes/op: 2.99046e+06 miss_ratio: 0.168818 max_rss_mb: 518.293
      34MB 32thread base -> kops/s: 3.528 io_bytes/op: 1.35722e+07 miss_ratio: 0.914691 max_rss_mb: 264.926
      34MB 32thread new -> kops/s: 3.604 io_bytes/op: 1.35744e+07 miss_ratio: 0.915054 max_rss_mb: 264.488
      233MB 1thread base.hyper -> kops/s: 53.909 io_bytes/op: 2552.35 miss_ratio: 0.0440566 max_rss_mb: 241.984
      233MB 1thread new.hyper -> kops/s: 62.792 io_bytes/op: 2549.79 miss_ratio: 0.044043 max_rss_mb: 241.922
      233MB 1thread base -> kops/s: 1.197 io_bytes/op: 2.75173e+06 miss_ratio: 0.103093 max_rss_mb: 241.559
      233MB 1thread new -> kops/s: 1.199 io_bytes/op: 2.73723e+06 miss_ratio: 0.10305 max_rss_mb: 240.93
      233MB 32thread base.hyper -> kops/s: 1298.69 io_bytes/op: 2539.12 miss_ratio: 0.0440307 max_rss_mb: 371.418
      233MB 32thread new.hyper -> kops/s: 1421.35 io_bytes/op: 2538.75 miss_ratio: 0.0440307 max_rss_mb: 347.273
      233MB 32thread base -> kops/s: 9.693 io_bytes/op: 2.77304e+06 miss_ratio: 0.103745 max_rss_mb: 569.691
      233MB 32thread new -> kops/s: 9.75 io_bytes/op: 2.77559e+06 miss_ratio: 0.103798 max_rss_mb: 552.82
      1597MB 1thread base.hyper -> kops/s: 58.607 io_bytes/op: 1449.14 miss_ratio: 0.0249324 max_rss_mb: 1583.55
      1597MB 1thread new.hyper -> kops/s: 69.6 io_bytes/op: 1434.89 miss_ratio: 0.0247167 max_rss_mb: 1584.02
      1597MB 1thread base -> kops/s: 60.478 io_bytes/op: 1421.28 miss_ratio: 0.024452 max_rss_mb: 1589.45
      1597MB 1thread new -> kops/s: 63.973 io_bytes/op: 1416.07 miss_ratio: 0.0243766 max_rss_mb: 1589.24
      1597MB 32thread base.hyper -> kops/s: 1436.2 io_bytes/op: 1357.93 miss_ratio: 0.0235353 max_rss_mb: 1692.92
      1597MB 32thread new.hyper -> kops/s: 1605.03 io_bytes/op: 1358.04 miss_ratio: 0.023538 max_rss_mb: 1702.78
      1597MB 32thread base -> kops/s: 280.059 io_bytes/op: 1350.34 miss_ratio: 0.023289 max_rss_mb: 1675.36
      1597MB 32thread new -> kops/s: 283.125 io_bytes/op: 1351.05 miss_ratio: 0.0232797 max_rss_mb: 1703.83
      
      Almost uniformly improving over base revision, especially for hot paths with HyperClockCache, up to 12% higher throughput seen (1597MB, 32thread, hyper). The improvement for that is likely coming from much simplified code for providing context for secondary cache promotion (CreateCallback/CreateContext), and possibly from less branching in block_based_table_reader. And likely a small improvement from not reconstituting key for DeleterFn.
      
      Reviewed By: anand1976
      
      Differential Revision: D42417818
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f86bfdd584dce27c028b151ba56818ad14f7a432
      9f7801c5
  7. 22 10月, 2022 1 次提交
    • C
      Ignore max_compaction_bytes for compaction input that are within output key-range (#10835) · 333abe9c
      Changyu Bi 提交于
      Summary:
      When picking compaction input files, we sometimes stop picking a file that is fully included in the output key-range due to hitting max_compaction_bytes. Including these input files can potentially reduce WA at the expense of larger compactions. Larger compaction should be fine as files from input level are usually 10X smaller than files from output level. This PR adds a mutable CF option `ignore_max_compaction_bytes_for_input` that is enabled by default. We can remove this option once we are sure it is safe.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10835
      
      Test Plan:
      - CI, a unit test on max_compaction_bytes fails before turning this flag off.
      - Benchmark does not show much difference in WA: `./db_bench --benchmarks=fillrandom,waitforcompaction,stats,levelstats -max_background_jobs=12 -num=2000000000 -target_file_size_base=33554432 --write_buffer_size=33554432`
      ```
      main:
      ** Compaction Stats [default] **
      Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0      3/0   91.59 MB   0.8     70.9     0.0     70.9     200.8    129.9       0.0   1.5     25.2     71.2   2886.55           2463.45      9725    0.297   1093M   254K       0.0       0.0
        L1      9/0   248.03 MB   1.0    392.0   129.8    262.2     391.7    129.5       0.0   3.0     69.0     68.9   5821.71           5536.90       804    7.241   6029M  5814K       0.0       0.0
        L2     87/0    2.50 GB   1.0    537.0   128.5    408.5     533.8    125.2       0.7   4.2     69.5     69.1   7912.24           7323.70      4417    1.791   8299M    36M       0.0       0.0
        L3    836/0   24.99 GB   1.0    616.9   118.3    498.7     594.5     95.8       5.2   5.0     66.9     64.5   9442.38           8490.28      4204    2.246   9749M   306M       0.0       0.0
        L4   2355/0   62.95 GB   0.3     67.3    37.1     30.2      54.2     24.0      38.9   1.5     72.2     58.2    954.37            821.18       917    1.041   1076M   173M       0.0       0.0
       Sum   3290/0   90.77 GB   0.0   1684.2   413.7   1270.5    1775.0    504.5      44.9  13.7     63.8     67.3  27017.25          24635.52     20067    1.346     26G   522M       0.0       0.0
      
      Cumulative compaction: 1774.96 GB write, 154.29 MB/s write, 1684.19 GB read, 146.40 MB/s read, 27017.3 seconds
      
      This PR:
      ** Compaction Stats [default] **
      Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0      3/0   45.71 MB   0.8     72.9     0.0     72.9     202.8    129.9       0.0   1.6     25.4     70.7   2938.16           2510.36      9741    0.302   1124M   265K       0.0       0.0
        L1      8/0   234.54 MB   0.9    384.5   129.8    254.7     384.2    129.6       0.0   3.0     69.0     68.9   5708.08           5424.43       791    7.216   5913M  5753K       0.0       0.0
        L2     84/0    2.47 GB   1.0    543.1   128.6    414.5     539.9    125.4       0.7   4.2     69.6     69.2   7989.31           7403.13      4418    1.808   8393M    36M       0.0       0.0
        L3    839/0   24.96 GB   1.0    615.6   118.4    497.2     593.2     96.0       5.1   5.0     66.6     64.1   9471.23           8489.31      4193    2.259   9726M   306M       0.0       0.0
        L4   2360/0   63.04 GB   0.3     67.6    37.3     30.3      54.4     24.1      38.9   1.5     71.5     57.6    967.30            827.99       907    1.066   1080M   173M       0.0       0.0
       Sum   3294/0   90.75 GB   0.0   1683.8   414.2   1269.6    1774.5    504.9      44.8  13.7     63.7     67.1  27074.08          24655.22     20050    1.350     26G   522M       0.0       0.0
      
      Cumulative compaction: 1774.52 GB write, 157.09 MB/s write, 1683.77 GB read, 149.06 MB/s read, 27074.1 seconds
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D40518319
      
      Pulled By: cbi42
      
      fbshipit-source-id: f4ea614bc0ebefe007ffaf05bb9aec9a8ca25b60
      333abe9c
  8. 19 10月, 2022 1 次提交
    • P
      Refactor ShardedCache for more sharing, static polymorphism (#10801) · 7555243b
      Peter Dillinger 提交于
      Summary:
      The motivations for this change include
      * Free up space in ClockHandle so that we can add data for secondary cache handling while still keeping within single cache line (64 byte) size.
        * This change frees up space by eliminating the need for the `hash` field by making the fixed-size key itself a hash, using a 128-bit bijective (lossless) hash.
      * Generally more customizability of ShardedCache (such as hashing) without worrying about virtual call overheads
        * ShardedCache now uses static polymorphism (template) instead of dynamic polymorphism (virtual overrides) for the CacheShard. No obvious performance benefit is seen from the change (as mostly expected; most calls to virtual functions in CacheShard could already be optimized to static calls), but offers more flexibility without incurring the runtime cost of adhering to a common interface (without type parameters or static callbacks).
        * You'll also notice less `reinterpret_cast`ing and other boilerplate in the Cache implementations, as this can go in ShardedCache.
      
      More detail:
      * Don't have LRUCacheShard maintain `std::shared_ptr<SecondaryCache>` copies (extra refcount) when LRUCache can be in charge of keeping a `shared_ptr`.
      * Renamed `capacity_mutex_` to `config_mutex_` to better represent the scope of what it guards.
      * Some preparation for 64-bit hash and indexing in LRUCache, but didn't include the full change because of slight performance regression.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10801
      
      Test Plan:
      Unit test updates were non-trivial because of major changes to the ClockCacheShard interface in handling of key vs. hash.
      
      Performance:
      Create with `TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16`
      
      Test with
      ```
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X1000] -readonly -num=30000000 -bloom_bits=16 -cache_index_and_filter_blocks=1 -cache_size=610000000 -duration 20 -threads=16
      ```
      
      Before: `readrandom [AVG 150 runs] : 321147 (± 253) ops/sec`
      After: `readrandom [AVG 150 runs] : 321530 (± 326) ops/sec`
      
      So possibly ~0.1% improvement.
      
      And with `-cache_type=hyper_clock_cache`:
      Before: `readrandom [AVG 30 runs] : 614126 (± 7978) ops/sec`
      After: `readrandom [AVG 30 runs] : 645349 (± 8087) ops/sec`
      
      So roughly 5% improvement!
      
      Reviewed By: anand1976
      
      Differential Revision: D40252236
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ff8fc70ef569585edc95bcbaaa0386f61355ae5b
      7555243b
  9. 18 10月, 2022 1 次提交
    • P
      Print stack traces on frozen tests in CI (#10828) · e466173d
      Peter Dillinger 提交于
      Summary:
      Instead of existing calls to ps from gnu_parallel, call a new wrapper that does ps, looks for unit test like processes, and uses pstack or gdb to print thread stack traces. Also, using `ps -wwf` instead of `ps -wf` ensures output is not cut off.
      
      For security, CircleCI runs with security restrictions on ptrace (/proc/sys/kernel/yama/ptrace_scope = 1), and this change adds a work-around to `InstallStackTraceHandler()` (only used by testing tools) to allow any process from the same user to debug it. (I've also touched >100 files to ensure all the unit tests call this function.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10828
      
      Test Plan: local manual + temporary infinite loop in a unit test to observe in CircleCI
      
      Reviewed By: hx235
      
      Differential Revision: D40447634
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 718a4c4a5b54fa0f9af2d01a446162b45e5e84e1
      e466173d
  10. 08 10月, 2022 1 次提交
    • J
      Add option `preserve_internal_time_seconds` to preserve the time info (#10747) · c401f285
      Jay Zhuang 提交于
      Summary:
      Add option `preserve_internal_time_seconds` to preserve the internal
      time information.
      It's mostly for the migration of the existing data to tiered storage (
      `preclude_last_level_data_seconds`). When the tiering feature is just
      enabled, the existing data won't have the time information to decide if
      it's hot or cold. Enabling this feature will start collect and preserve
      the time information for the new data.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10747
      
      Reviewed By: siying
      
      Differential Revision: D39910141
      
      Pulled By: siying
      
      fbshipit-source-id: 25c21638e37b1a7c44006f636b7d714fe7242138
      c401f285
  11. 30 9月, 2022 1 次提交
    • J
      Align compaction output file boundaries to the next level ones (#10655) · f3cc6663
      Jay Zhuang 提交于
      Summary:
      Try to align the compaction output file boundaries to the next level ones
      (grandparent level), to reduce the level compaction write-amplification.
      
      In level compaction, there are "wasted" data at the beginning and end of the
      output level files. Align the file boundary can avoid such "wasted" compaction.
      With this PR, it tries to align the non-bottommost level file boundaries to its
      next level ones. It may cut file when the file size is large enough (at least
      50% of target_file_size) and not too large (2x target_file_size).
      
      db_bench shows about 12.56% compaction reduction:
      ```
      TEST_TMPDIR=/data/dbbench2 ./db_bench --benchmarks=fillrandom,readrandom -max_background_jobs=12 -num=400000000 -target_file_size_base=33554432
      
      # baseline:
      Flush(GB): cumulative 25.882, interval 7.216
      Cumulative compaction: 285.90 GB write, 162.36 MB/s write, 269.68 GB read, 153.15 MB/s read, 2926.7 seconds
      
      # with this change:
      Flush(GB): cumulative 25.882, interval 7.753
      Cumulative compaction: 249.97 GB write, 141.96 MB/s write, 233.74 GB read, 132.74 MB/s read, 2534.9 seconds
      ```
      
      The compaction simulator shows a similar result (14% with 100G random data).
      As a side effect, with this PR, the SST file size can exceed the
      target_file_size, but is capped at 2x target_file_size. And there will be
      smaller files. Here are file size statistics when loading 100GB with the target
      file size 32MB:
      ```
                baseline      this_PR
      count  1.656000e+03  1.705000e+03
      mean   3.116062e+07  3.028076e+07
      std    7.145242e+06  8.046139e+06
      ```
      
      The feature is enabled by default, to revert to the old behavior disable it
      with `AdvancedColumnFamilyOptions.level_compaction_dynamic_file_size = false`
      
      Also includes https://github.com/facebook/rocksdb/issues/1963 to cut file before skippable grandparent file. Which is for
      use case like user adding 2 or more non-overlapping data range at the same
      time, it can reduce the overlapping of 2 datasets in the lower levels.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10655
      
      Reviewed By: cbi42
      
      Differential Revision: D39552321
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 640d15f159ab0cd973f2426cfc3af266fc8bdde2
      f3cc6663
  12. 14 9月, 2022 1 次提交
  13. 08 9月, 2022 1 次提交
    • B
      Avoid recompressing cold block in CompressedSecondaryCache (#10527) · d490bfcd
      Bo Wang 提交于
      Summary:
      **Summary:**
      When a block is firstly `Lookup` from the secondary cache, we just insert a dummy block in the primary cache (charging the actual size of the block) and don’t erase the block from the secondary cache. A standalone handle is returned from `Lookup`. Only if the block is hit again, we erase it from the secondary cache and add it into the primary cache.
      
      When a block is firstly evicted from the primary cache to the secondary cache, we just insert a dummy block (size 0) in the secondary cache. When the block is evicted again, it is treated as a hot block and is inserted into the secondary cache.
      
      **Implementation Details**
      Add a new state of LRUHandle: The handle is never inserted into the LRUCache (both hash table and LRU list) and it doesn't experience the above three states. The entry can be freed when refs becomes 0.  (refs >= 1 && in_cache == false && IS_STANDALONE == true)
      
      The behaviors of  `LRUCacheShard::Lookup()` are updated if the secondary_cache is CompressedSecondaryCache:
      1. If a handle is found in primary cache:
        1.1. If the handle's value is not nullptr, it is returned immediately.
        1.2. If the handle's value is nullptr, this means the handle is a dummy one. For a dummy handle, if it was retrieved from secondary cache, it may still exist in secondary cache.
          - 1.2.1. If no valid handle can be `Lookup` from secondary cache, return nullptr.
          - 1.2.2. If the handle from secondary cache is valid, erase it from the secondary cache and add it into the primary cache.
      2. If a handle is not found in primary cache:
        2.1. If no valid handle can be `Lookup` from secondary cache, return nullptr.
        2.2.  If the handle from secondary cache is valid, insert a dummy block in the primary cache (charging the actual size of the block)  and return a standalone handle.
      
      The behaviors of `LRUCacheShard::Promote()` are updated as follows:
      1. If `e->sec_handle` has value, one of the following steps can happen:
        1.1. Insert a dummy handle and return a standalone handle to caller when `secondary_cache_` is `CompressedSecondaryCache` and e is a standalone handle.
        1.2. Insert the item into the primary cache and return the handle to caller.
        1.3. Exception handling.
      3. If `e->sec_handle` has no value, mark the item as not in cache and charge the cache as its only metadata that'll shortly be released.
      
      The behavior of  `CompressedSecondaryCache::Insert()` is updated:
      1. If a block is evicted from the primary cache for the first time, a dummy item is inserted.
      4. If a dummy item is found for a block, the block is inserted into the secondary cache.
      
      The behavior of  `CompressedSecondaryCache:::Lookup()` is updated:
      1. If a handle is not found or it is a dummy item, a nullptr is returned.
      2. If `erase_handle` is true, the handle is erased.
      
      The behaviors of  `LRUCacheShard::Release()` are adjusted for the standalone handles.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10527
      
      Test Plan:
      1. stress tests.
      5. unit tests.
      6. CPU profiling for db_bench.
      
      Reviewed By: siying
      
      Differential Revision: D38747613
      
      Pulled By: gitbw95
      
      fbshipit-source-id: 74a1eba7e1957c9affb2bd2ae3e0194584fa6eca
      d490bfcd
  14. 02 9月, 2022 1 次提交
  15. 20 8月, 2022 1 次提交
    • A
      MultiGet async IO across multiple levels (#10535) · 35cdd3e7
      anand76 提交于
      Summary:
      This PR exploits parallelism in MultiGet across levels. It applies only to the coroutine version of MultiGet. Previously, MultiGet file reads from SST files in the same level were parallelized. With this PR, MultiGet batches with keys distributed across multiple levels are read in parallel. This is accomplished by splitting the keys not present in a level (determined by bloom filtering) into a separate batch, and processing the new batch in parallel with the original batch.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10535
      
      Test Plan:
      1. Ensure existing MultiGet unit tests pass, updating them as necessary
      2. New unit tests - TODO
      3. Run stress test - TODO
      
      No noticeable regression (<1%) without async IO -
      Without PR: `multireadrandom :       7.261 micros/op 1101724 ops/sec 60.007 seconds 66110936 operations;  571.6 MB/s (8168992 of 8168992 found)`
      With PR: `multireadrandom :       7.305 micros/op 1095167 ops/sec 60.007 seconds 65717936 operations;  568.2 MB/s (8271992 of 8271992 found)`
      
      For a fully cached DB, but with async IO option on, no regression observed (<1%) -
      Without PR: `multireadrandom :       5.201 micros/op 1538027 ops/sec 60.005 seconds 92288936 operations;  797.9 MB/s (11540992 of 11540992 found) `
      With PR: `multireadrandom :       5.249 micros/op 1524097 ops/sec 60.005 seconds 91452936 operations;  790.7 MB/s (11649992 of 11649992 found) `
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D38774009
      
      Pulled By: anand1976
      
      fbshipit-source-id: c955e259749f1c091590ade73105b3ee46cd0007
      35cdd3e7
  16. 13 8月, 2022 1 次提交
    • C
      Add memtable per key-value checksum (#10281) · fd165c86
      Changyu Bi 提交于
      Summary:
      Append per key-value checksum to internal key. These checksums are verified on read paths including Get, Iterator and during Flush. Get and Iterator will return `Corruption` status if there is a checksum verification failure. Flush will make DB become read-only upon memtable entry checksum verification failure.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10281
      
      Test Plan:
      - Added new unit test cases: `make check`
      - Benchmark on memtable insert
      ```
      TEST_TMPDIR=/dev/shm/memtable_write ./db_bench -benchmarks=fillseq -disable_wal=true -max_write_buffer_number=100 -num=10000000 -min_write_buffer_number_to_merge=100
      
      # avg over 10 runs
      Baseline: 1166936 ops/sec
      memtable 2 bytes kv checksum : 1.11674e+06 ops/sec (-4%)
      memtable 2 bytes kv checksum + write batch 8 bytes kv checksum: 1.08579e+06 ops/sec (-6.95%)
      write batch 8 bytes kv checksum: 1.17979e+06 ops/sec (+1.1%)
      ```
      -  Benchmark on only memtable read: ops/sec dropped 31% for `readseq` due to time spend on verifying checksum.
      ops/sec for `readrandom` dropped ~6.8%.
      ```
      # Readseq
      sudo TEST_TMPDIR=/dev/shm/memtable_read ./db_bench -benchmarks=fillseq,readseq"[-X20]" -disable_wal=true -max_write_buffer_number=100 -num=10000000 -min_write_buffer_number_to_merge=100
      
      readseq [AVG    20 runs] : 7432840 (± 212005) ops/sec;  822.3 (± 23.5) MB/sec
      readseq [MEDIAN 20 runs] : 7573878 ops/sec;  837.9 MB/sec
      
      With -memtable_protection_bytes_per_key=2:
      
      readseq [AVG    20 runs] : 5134607 (± 119596) ops/sec;  568.0 (± 13.2) MB/sec
      readseq [MEDIAN 20 runs] : 5232946 ops/sec;  578.9 MB/sec
      
      # Readrandom
      sudo TEST_TMPDIR=/dev/shm/memtable_read ./db_bench -benchmarks=fillrandom,readrandom"[-X10]" -disable_wal=true -max_write_buffer_number=100 -num=1000000 -min_write_buffer_number_to_merge=100
      readrandom [AVG    10 runs] : 140236 (± 3938) ops/sec;    9.8 (± 0.3) MB/sec
      readrandom [MEDIAN 10 runs] : 140545 ops/sec;    9.8 MB/sec
      
      With -memtable_protection_bytes_per_key=2:
      readrandom [AVG    10 runs] : 130632 (± 2738) ops/sec;    9.1 (± 0.2) MB/sec
      readrandom [MEDIAN 10 runs] : 130341 ops/sec;    9.1 MB/sec
      ```
      
      - Stress test: `python3 -u tools/db_crashtest.py whitebox --duration=1800`
      
      Reviewed By: ajkr
      
      Differential Revision: D37607896
      
      Pulled By: cbi42
      
      fbshipit-source-id: fdaefb475629d2471780d4a5f5bf81b44ee56113
      fd165c86
  17. 09 8月, 2022 1 次提交
  18. 20 7月, 2022 1 次提交
  19. 19 7月, 2022 1 次提交
    • A
      Make RateLimiter not Customizable (#10378) · 25cc564f
      Andrew Kryczka 提交于
      Summary:
      (PR created for informational/testing purposes only.)
      
      - Fixes lost dynamic updates to GenericRateLimiter bandwidth using `SetBytesPerSecond()`
      - Benefit over #10374 is eliminating race conditions with Configurable framework.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10378
      
      Reviewed By: pdillinger
      
      Differential Revision: D37914865
      
      fbshipit-source-id: d4f566d60ec9726d26932388c61671adf0ee0f30
      25cc564f
  20. 17 7月, 2022 1 次提交
  21. 15 7月, 2022 1 次提交
    • J
      Add seqno to time mapping (#10338) · a3acf2ef
      Jay Zhuang 提交于
      Summary:
      Which will be used for tiered storage to preclude hot data from
      compacting to the cold tier (the last level).
      Internally, adding seqno to time mapping. A periodic_task is scheduled
      to record the current_seqno -> current_time in certain cadence. When
      memtable flush, the mapping informaiton is stored in sstable property.
      During compaction, the mapping information are merged and get the
      approximate time of sequence number, which is used to determine if a key
      is recently inserted or not and preclude it from the last level if it's
      recently inserted (within the `preclude_last_level_data_seconds`).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10338
      
      Test Plan: CI
      
      Reviewed By: siying
      
      Differential Revision: D37810187
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 6953be7a18a99de8b1cb3b162d712f79c2b4899f
      a3acf2ef
  22. 24 6月, 2022 1 次提交
    • B
      Dynamically changeable `MemPurge` option (#10011) · 5879053f
      Baptiste Lemaire 提交于
      Summary:
      **Summary**
      Make the mempurge option flag a Mutable Column Family option flag. Therefore, the mempurge feature can be dynamically toggled.
      
      **Motivation**
      RocksDB users prefer having the ability to switch features on and off without having to close and reopen the DB. This is particularly important if the feature causes issues and needs to be turned off. Dynamically changing a DB option flag does not seem currently possible.
      Moreover, with this new change, the MemPurge feature can be toggled on or off independently between column families, which we see as a major improvement.
      
      **Content of this PR**
      This PR includes removal of the `experimental_mempurge_threshold` flag as a DB option flag, and its re-introduction as a `MutableCFOption` flag. I updated the code to handle dynamic changes of the flag (in particular inside the `FlushJob` file). Additionally, this PR includes a new test to demonstrate the capacity of the code to toggle the MemPurge feature on and off, as well as the addition in the `db_stress` module of 2 different mempurge threshold values (0.0 and 1.0) that can be randomly changed with the `set_option_one_in` flag. This is useful to stress test the dynamic changes.
      
      **Benchmarking**
      I will add numbers to prove that there is no performance impact within the next 12 hours.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10011
      
      Reviewed By: pdillinger
      
      Differential Revision: D36462357
      
      Pulled By: bjlemaire
      
      fbshipit-source-id: 5e3d63bdadf085c0572ecc2349e7dd9729ce1802
      5879053f
  23. 22 6月, 2022 1 次提交
    • Z
      Add basic kRoundRobin compaction policy (#10107) · 30141461
      zczhu 提交于
      Summary:
      Add `kRoundRobin` as a compaction priority. The implementation is as follows.
      
      - Define a cursor as the smallest Internal key in the successor of the selected file. Add `vector<InternalKey> compact_cursor_` into `VersionStorageInfo` where each element (`InternalKey`) in `compact_cursor_` represents a cursor. In round-robin compaction policy, we just need to select the first file (assuming files are sorted) and also has the smallest InternalKey larger than/equal to the cursor. After a file is chosen, we create a new `Fsize` vector which puts the selected file is placed at the first position in `temp`, the next cursor is then updated as the smallest InternalKey in successor of the selected file (the above logic is implemented in `SortFileByRoundRobin`).
      - After a compaction succeeds, typically `InstallCompactionResults()`, we choose the next cursor for the input level and save it to `edit`. When calling `LogAndApply`, we save the next cursor with its level into some local variable and finally apply the change to `vstorage` in `SaveTo` function.
      - Cursors are persist pair by pair (<level, InternalKey>) in `EncodeTo` so that they can be reconstructed when reopening. An empty cursor will not be encoded to MANIFEST
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10107
      
      Test Plan: add unit test (`CompactionPriRoundRobin`) in `compaction_picker_test`, add `kRoundRobin` priority in `CompactionPriTest` from `db_compaction_test`, and add `PersistRoundRobinCompactCursor` in `db_compaction_test`
      
      Reviewed By: ajkr
      
      Differential Revision: D37316037
      
      Pulled By: littlepig2013
      
      fbshipit-source-id: 9f481748190ace416079139044e00df2968fb1ee
      30141461
  24. 17 6月, 2022 1 次提交
    • P
      Remove deprecated block-based filter (#10184) · 126c2237
      Peter Dillinger 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/9535, release 7.0, we hid the old block-based filter from being created using
      the public API, because of its inefficiency. Although we normally maintain read compatibility
      on old DBs forever, filters are not required for reading a DB, only for optimizing read
      performance. Thus, it should be acceptable to remove this code and the substantial
      maintenance burden it carries as useful features are developed and validated (such
      as user timestamp).
      
      This change completely removes the code for reading and writing the old block-based
      filters, net removing about 1370 lines of code no longer needed. Options removed from
      testing / benchmarking tools. The prior existence is only evident in a couple of places:
      * `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in
      a major release to minimize source code incompatibilities.
      * A warning is logged when an old table file is opened that used the old block-based
      filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing
      should suffice. Unfortunately, sst_dump does not tell you whether a file uses
      block-based filter, and the structure of the code makes it very difficult to fix.
      * To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`)
      for metaindex is maintained (for now).
      
      Other notes:
      * In some cases where numbers are associated with filter configurations, we have had to
      update the assigned numbers so that they all correspond to something that exists.
      * Fixed potential stat counting bug by assuming `filter_checked = false` for cases
      like `filter == nullptr` rather than assuming `filter_checked = true`
      * Removed obsolete `block_offset` and `prefix_extractor` parameters from several
      functions.
      * Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)`
      because the caller guarantees the prefix extractor exists and is compatible
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10184
      
      Test Plan:
      tests updated, manually test new warning in LOG using base version to
      generate a DB
      
      Reviewed By: riversand963
      
      Differential Revision: D37212647
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80
      126c2237
  25. 15 6月, 2022 2 次提交
  26. 03 6月, 2022 1 次提交
    • G
      Make it possible to enable blob files starting from a certain LSM tree level (#10077) · e6432dfd
      Gang Liao 提交于
      Summary:
      Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.
      
      In order to achieve this, we would like to do the following:
      - Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic)
      - Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level`
      - Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` )
      - Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh`
      - Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
      - Ideally extend the C and Java bindings with the new option
      - Update the BlobDB wiki to document the new option.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077
      
      Reviewed By: ltamasi
      
      Differential Revision: D36884156
      
      Pulled By: gangliao
      
      fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d
      e6432dfd
  27. 25 5月, 2022 1 次提交
    • C
      Support read rate-limiting in SequentialFileReader (#9973) · 8515bd50
      Changyu Bi 提交于
      Summary:
      Added rate limiter and read rate-limiting support to SequentialFileReader. I've updated call sites to SequentialFileReader::Read with appropriate IO priority (or left a TODO and specified IO_TOTAL for now).
      
      The PR is separated into four commits: the first one added the rate-limiting support, but with some fixes in the unit test since the number of request bytes from rate limiter in SequentialFileReader are not accurate (there is overcharge at EOF). The second commit fixed this by allowing SequentialFileReader to check file size and determine how many bytes are left in the file to read. The third commit added benchmark related code. The fourth commit moved the logic of using file size to avoid overcharging the rate limiter into backup engine (the main user of SequentialFileReader).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9973
      
      Test Plan:
      - `make check`, backup_engine_test covers usage of SequentialFileReader with rate limiter.
      - Run db_bench to check if rate limiting is throttling as expected: Verified that reads and writes are together throttled at 2MB/s, and at 0.2MB chunks that are 100ms apart.
        - Set up: `./db_bench --benchmarks=fillrandom -db=/dev/shm/test_rocksdb`
        - Benchmark:
      ```
      strace -ttfe read,write ./db_bench --benchmarks=backup -db=/dev/shm/test_rocksdb --backup_rate_limit=2097152 --use_existing_db
      strace -ttfe read,write ./db_bench --benchmarks=restore -db=/dev/shm/test_rocksdb --restore_rate_limit=2097152 --use_existing_db
      ```
      - db bench on backup and restore to ensure no performance regression.
        - backup (avg over 50 runs): pre-change: 1.90443e+06 micros/op; post-change: 1.8993e+06 micros/op (improve by 0.2%)
        - restore (avg over 50 runs): pre-change: 1.79105e+06 micros/op; post-change: 1.78192e+06 micros/op (improve by 0.5%)
      
      ```
      # Set up
      ./db_bench --benchmarks=fillrandom -db=/tmp/test_rocksdb -num=10000000
      
      # benchmark
      TEST_TMPDIR=/tmp/test_rocksdb
      NUM_RUN=50
      for ((j=0;j<$NUM_RUN;j++))
      do
         ./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=backup -use_existing_db | egrep 'backup'
        # Restore
        #./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=restore -use_existing_db
      done > rate_limit.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' rate_limit.txt >> rate_limit_2.txt
      ```
      
      Reviewed By: hx235
      
      Differential Revision: D36327418
      
      Pulled By: cbi42
      
      fbshipit-source-id: e75d4307cff815945482df5ba630c1e88d064691
      8515bd50
  28. 21 5月, 2022 2 次提交
    • Y
      Fix a bug of not setting enforce_single_del_contracts (#10027) · f648915b
      Yanqin Jin 提交于
      Summary:
      Before this PR, BuildDBOptions() does not set a newly-added option, i.e.
      enforce_single_del_contracts, causing OPTIONS files to contain incorrect
      information.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10027
      
      Test Plan:
      make check
      Manually check OPTIONS file.
      
      Reviewed By: ltamasi
      
      Differential Revision: D36556125
      
      Pulled By: riversand963
      
      fbshipit-source-id: e1074715b22c328b68c19e9ad89aa5d67d864bb5
      f648915b
    • C
      Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857) · cc23b46d
      Changyu Bi 提交于
      Summary:
      An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857
      
      Test Plan:
      #### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data:
      Set up: change the parameter [here](https://github.com/facebook/rocksdb/blob/fb9a167a55e0970b1ef6f67c1600c8d9c4c6114f/tools/db_bench_tool.cc#L1766) to 16384 to make synthetic data more compressible.
      ```
      # linked local ZSTD with version 1.5.2
      # DEBUG_LEVEL=0 ROCKSDB_NO_FBCODE=1 ROCKSDB_DISABLE_ZSTD=1  EXTRA_CXXFLAGS="-DZSTD_STATIC_LINKING_ONLY -DZSTD -I/data/users/changyubi/install/include/" EXTRA_LDFLAGS="-L/data/users/changyubi/install/lib/ -l:libzstd.a" make -j32 db_bench
      
      dict_bytes=16384
      train_bytes=1048576
      echo "========== No Dictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      echo "========== Raw Content Dictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench_main -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      echo "========== FinalizeDictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      echo "========== TrainDictionary =========="
      TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 2>&1 | grep elapsed
      du -hc /dev/shm/dbbench/*sst | grep total
      
      # Result: TrainDictionary is much better on space saving, but FinalizeDictionary seems to use less memory.
      # before compression data size: 1.2GB
      dict_bytes=16384
      max_dict_buffer_bytes =  1048576
                          space   cpu/memory
      No Dictionary       468M    14.93user 1.00system 0:15.92elapsed 100%CPU (0avgtext+0avgdata 23904maxresident)k
      Raw Dictionary      251M    15.81user 0.80system 0:16.56elapsed 100%CPU (0avgtext+0avgdata 156808maxresident)k
      FinalizeDictionary  236M    11.93user 0.64system 0:12.56elapsed 100%CPU (0avgtext+0avgdata 89548maxresident)k
      TrainDictionary     84M     7.29user 0.45system 0:07.75elapsed 100%CPU (0avgtext+0avgdata 97288maxresident)k
      ```
      
      #### Benchmark on 10 sample SST files for spacing saving and CPU time on compression:
      FinalizeDictionary is comparable to TrainDictionary in terms of space saving, and takes less time in compression.
      ```
      dict_bytes=16384
      train_bytes=1048576
      
      for sst_file in `ls ../temp/myrock-sst/`
      do
        echo "********** $sst_file **********"
        echo "========== No Dictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD
      
        echo "========== Raw Content Dictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes
      
        echo "========== FinalizeDictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes --compression_use_zstd_finalize_dict
      
        echo "========== TrainDictionary =========="
        ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes
      done
      
                               010240.sst (Size/Time) 011029.sst              013184.sst              021552.sst              185054.sst              185137.sst              191666.sst              7560381.sst             7604174.sst             7635312.sst
      No Dictionary           28165569 / 2614419      32899411 / 2976832      32977848 / 3055542      31966329 / 2004590      33614351 / 1755877      33429029 / 1717042      33611933 / 1776936      33634045 / 2771417      33789721 / 2205414      33592194 / 388254
      Raw Content Dictionary  28019950 / 2697961      33748665 / 3572422      33896373 / 3534701      26418431 / 2259658      28560825 / 1839168      28455030 / 1846039      28494319 / 1861349      32391599 / 3095649      33772142 / 2407843      33592230 / 474523
      FinalizeDictionary      27896012 / 2650029      33763886 / 3719427      33904283 / 3552793      26008225 / 2198033      28111872 / 1869530      28014374 / 1789771      28047706 / 1848300      32296254 / 3204027      33698698 / 2381468      33592344 / 517433
      TrainDictionary         28046089 / 2740037      33706480 / 3679019      33885741 / 3629351      25087123 / 2204558      27194353 / 1970207      27234229 / 1896811      27166710 / 1903119      32011041 / 3322315      32730692 / 2406146      33608631 / 570593
      ```
      
      #### Decompression/Read test:
      With FinalizeDictionary/TrainDictionary, some data structure used for decompression are in stored in dictionary, so they are expected to be faster in terms of decompression/reads.
      ```
      dict_bytes=16384
      train_bytes=1048576
      echo "No Dictionary"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=0 > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=0 2>&1 | grep MB/s
      
      echo "Raw Dictionary"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd  -compression_max_dict_bytes=$dict_bytes 2>&1 | grep MB/s
      
      echo "FinalizeDict"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false  > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false 2>&1 | grep MB/s
      
      echo "Train Dictionary"
      TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes > /dev/null 2>&1
      TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes 2>&1 | grep MB/s
      
      No Dictionary
      readrandom   :      12.183 micros/op 82082 ops/sec 12.183 seconds 1000000 operations;    9.1 MB/s (1000000 of 1000000 found)
      Raw Dictionary
      readrandom   :      12.314 micros/op 81205 ops/sec 12.314 seconds 1000000 operations;    9.0 MB/s (1000000 of 1000000 found)
      FinalizeDict
      readrandom   :       9.787 micros/op 102180 ops/sec 9.787 seconds 1000000 operations;   11.3 MB/s (1000000 of 1000000 found)
      Train Dictionary
      readrandom   :       9.698 micros/op 103108 ops/sec 9.699 seconds 1000000 operations;   11.4 MB/s (1000000 of 1000000 found)
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D35720026
      
      Pulled By: cbi42
      
      fbshipit-source-id: 24d230fdff0fd28a1bb650658798f00dfcfb2a1f
      cc23b46d
  29. 20 5月, 2022 1 次提交
    • J
      Track SST unique id in MANIFEST and verify (#9990) · c6d326d3
      Jay Zhuang 提交于
      Summary:
      Start tracking SST unique id in MANIFEST, which is used to verify with
      SST properties to make sure the SST file is not overwritten or
      misplaced. A DB option `try_verify_sst_unique_id` is introduced to
      enable/disable the verification, if enabled, it opens all SST files
      during DB-open to read the unique_id from table properties (default is
      false), so it's recommended to use it with `max_open_files = -1` to
      pre-open the files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990
      
      Test Plan: unittests, format-compatible test, mini-crash
      
      Reviewed By: anand1976
      
      Differential Revision: D36381863
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f
      c6d326d3
  30. 18 5月, 2022 1 次提交
    • H
      Rewrite memory-charging feature's option API (#9926) · 3573558e
      Hui Xiao 提交于
      Summary:
      **Context:**
      Previous PR https://github.com/facebook/rocksdb/pull/9748, https://github.com/facebook/rocksdb/pull/9073, https://github.com/facebook/rocksdb/pull/8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`.
      
      Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags.
      
      **Summary:**
      - Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that
      - Added missing db bench/stress test for some memory charging features
      - Renamed related test suite to indicate they are under the same theme of memory charging
      - Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication
      - Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9926
      
      Test Plan:
      - New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)`
      - New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)`
      - CI
      - db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR  -charge_compression_dictionary_building_buffer=1(remove this for comparison)  -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'`
      
      #-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
      -- | -- | -- | -- | -- | --
      10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721
      20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465**
      40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078**
      
      - db_stress: `python3 tools/db_crashtest.py blackbox  -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal
      
      Reviewed By: ajkr
      
      Differential Revision: D36054712
      
      Pulled By: hx235
      
      fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af
      3573558e
  31. 17 5月, 2022 2 次提交
    • Y
      Add a temporary option for user to opt-out enforcement of SingleDelete contract (#9983) · 3f263ef5
      Yanqin Jin 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/9888 started to enforce the contract of single delete described in https://github.com/facebook/rocksdb/wiki/Single-Delete.
      
      For some of existing use cases, it is desirable to have a transition during which compaction will not fail
      if the contract is violated. Therefore, we add a temporary option `enforce_single_del_contracts` to allow
      application to opt out from this new strict behavior. Once transition completes, the flag can be set to `true` again.
      
      In a future release, the option will be removed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9983
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D36333672
      
      Pulled By: riversand963
      
      fbshipit-source-id: dcb703ea0ed08076a1422f1bfb9914afe3c2caa2
      3f263ef5
    • M
      Added GetFactoryCount/Names/Types to ObjectRegistry (#9358) · 204a42ca
      mrambacher 提交于
      Summary:
      These methods allow for more thorough testing of the ObjectRegistry and Customizable infrastructure in a simpler manner.  With this change, the Customizable tests can now check what factories are registered and attempt to create each of them in a systematic fashion.
      
      With this change, I think all of the factories registered with the ObjectRegistry/CreateFromString are now tested via the customizable_test classes.
      
      Note that there were a few other minor changes.  There was a "posix://*" register with the ObjectRegistry which was missed during the PatternEntry conversion -- these changes found that.  The nickname and default names for the FileSystem classes was also inverted.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9358
      
      Reviewed By: pdillinger
      
      Differential Revision: D33433542
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 9a32da74e6620745b4eeffb2712be70eeeadfa7e
      204a42ca
  32. 13 5月, 2022 1 次提交
    • M
      Option type info functions (#9411) · bfc6a8ee
      mrambacher 提交于
      Summary:
      Add methods to set the various functions (Parse, Serialize, Equals) to the OptionTypeInfo.  These methods simplify the number of constructors required for OptionTypeInfo and make the code a little clearer.
      
      Add functions to the OptionTypeInfo for Prepare and Validate.  These methods allow types other than Configurable and Customizable to have Prepare and Validate logic.  These methods could be used by an option to guarantee that its settings were in a range or that a value was initialized.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9411
      
      Reviewed By: pdillinger
      
      Differential Revision: D36174849
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 72517d8c6bab4723788a4c1a9e16590bff870125
      bfc6a8ee
  33. 07 5月, 2022 1 次提交
    • S
      Remove own ToString() (#9955) · 736a7b54
      sdong 提交于
      Summary:
      ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString().
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955
      
      Test Plan: Watch CI tests
      
      Reviewed By: riversand963
      
      Differential Revision: D36176799
      
      fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471
      736a7b54
  34. 06 5月, 2022 1 次提交
    • S
      Use std::numeric_limits<> (#9954) · 49628c9a
      sdong 提交于
      Summary:
      Right now we still don't fully use std::numeric_limits but use a macro, mainly for supporting VS 2013. Right now we only support VS 2017 and up so it is not a problem. The code comment claims that MinGW still needs it. We don't have a CI running MinGW so it's hard to validate. since we now require C++17, it's hard to imagine MinGW would still build RocksDB but doesn't support std::numeric_limits<>.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9954
      
      Test Plan: See CI Runs.
      
      Reviewed By: riversand963
      
      Differential Revision: D36173954
      
      fbshipit-source-id: a35a73af17cdcae20e258cdef57fcf29a50b49e0
      49628c9a
  35. 19 4月, 2022 1 次提交
    • P
      Misc CI improvements / additions (#9859) · 1601433b
      Peter Dillinger 提交于
      Summary:
      * Add valgrind test to nightly CircleCI (in case it can catch something that
      ASAN/UBSAN does not)
      * Add clang13+asan+ubsan+folly test to nightly CircleCI, for broader testing
      * Consolidate many copies of ASAN_OPTIONS= while also allowing it to be
      inherited from parent environment rather than always overridden.
      * Move UBSAN exclusion from Makefile into options_settable_test.cc
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9859
      
      Test Plan: CI
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D35730903
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6f5464034e8115f9a07f6f7aec1de9219ec2837c
      1601433b
  36. 16 4月, 2022 1 次提交
    • A
      Make initial auto readahead_size configurable (#9836) · 0c7f455f
      Akanksha Mahajan 提交于
      Summary:
      Make initial auto readahead_size configurable
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9836
      
      Test Plan:
      Added new unit test
      Ran regression:
      Without change:
      
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
      With this change:
      ```
       ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Set seed to 1649895440554504 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.2
      Date:       Wed Apr 13 17:17:20 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      ... finished 100 ops
      seekrandom   :  476892.488 micros/op 2 ops/sec;  344.6 MB/s (252 of 252 found)
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D35632815
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: c8057a88f9294c9d03b1d434b03affe02f74d796
      0c7f455f
  37. 12 4月, 2022 1 次提交
    • G
      Prevent double caching in the compressed secondary cache (#9747) · f241d082
      gitbw95 提交于
      Summary:
      ###  **Summary:**
      When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached.
      
      **Changes include:**
      1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions.
      2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache().
      3. Rename LRUSecondaryCache to CompressedSecondaryCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747
      
      Test Plan:
      **Test Scripts:**
      1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB.
      ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1
      
      2. overwrite it to a stable state:
      ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1
      
      4. Run read tests with diffeernt cache setting:
      
      T1:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000  --statistics -db=/db_bench_1
      
      T2:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1
      
      T3:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1
      
      T4:
      ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1
      
      **Before this PR**
      | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
      |------------|-------------------------------------|----------------|
      |520 MB | 0 MB | 85.5% |
      |320 MB | 400 MB | 96.2% |
      |520 MB | 400 MB | 98.3% |
      |20 MB | 500 MB | 98.8% |
      
      **Before this PR**
      | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
      |------------|-------------------------------------|----------------|
      |520 MB | 0 MB | 85.5% |
      |320 MB | 400 MB | 99.9% |
      |520 MB | 400 MB | 99.9% |
      |20 MB | 500 MB | 99.2% |
      
      Reviewed By: anand1976
      
      Differential Revision: D35117499
      
      Pulled By: gitbw95
      
      fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
      f241d082