1. 11 2月, 2021 5 次提交
    • P
      Add prefetching (batched MultiGet) for experimental Ribbon filter (#7889) · e4f1e64c
      Peter Dillinger 提交于
      Summary:
      Adds support for prefetching data in Ribbon queries,
      which especially optimizes batched Ribbon queries for MultiGet
      (~222ns/key to ~97ns/key) but also single key queries on cold memory
      (~333ns to ~226ns) because many queries span more than one cache line.
      
      This required some refactoring of the query algorithm, and there
      does not appear to be a noticeable regression in "hot memory" query
      times (perhaps from 48ns to 50ns).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7889
      
      Test Plan:
      existing unit tests, plus performance validation with
      filter_bench:
      
      Each data point is the best of two runs. I saturated the machine
      CPUs with other filter_bench runs in the background.
      
      Before:
      
          $ ./filter_bench -impl=3 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
          WARNING: Assertions are enabled; benchmarks unnecessarily slow
          Building...
          Build avg ns/key: 125.86
          Number of filters: 1993
          Total size (MB): 168.166
          Reported total allocated memory (MB): 183.211
          Reported internal fragmentation: 8.94626%
          Bits/key stored: 7.05341
          Prelim FP rate %: 0.951827
          ----------------------------
          Mixed inside/outside queries...
            Single filter net ns/op: 48.0111
            Batched, prepared net ns/op: 222.384
            Batched, unprepared net ns/op: 343.908
            Skewed 50% in 1% net ns/op: 252.916
            Skewed 80% in 20% net ns/op: 320.579
            Random filter net ns/op: 332.957
      
      After:
      
          $ ./filter_bench -impl=3 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
          WARNING: Assertions are enabled; benchmarks unnecessarily slow
          Building...
          Build avg ns/key: 128.117
          Number of filters: 1993
          Total size (MB): 168.166
          Reported total allocated memory (MB): 183.211
          Reported internal fragmentation: 8.94626%
          Bits/key stored: 7.05341
          Prelim FP rate %: 0.951827
          ----------------------------
          Mixed inside/outside queries...
            Single filter net ns/op: 49.8812
            Batched, prepared net ns/op: 97.1514
            Batched, unprepared net ns/op: 222.025
            Skewed 50% in 1% net ns/op: 197.48
            Skewed 80% in 20% net ns/op: 212.457
            Random filter net ns/op: 226.464
      
      Bloom comparison, for reference:
      
          $ ./filter_bench -impl=2 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
          WARNING: Assertions are enabled; benchmarks unnecessarily slow
          Building...
          Build avg ns/key: 35.3042
          Number of filters: 1993
          Total size (MB): 238.488
          Reported total allocated memory (MB): 262.875
          Reported internal fragmentation: 10.2255%
          Bits/key stored: 10.0029
          Prelim FP rate %: 0.965327
          ----------------------------
          Mixed inside/outside queries...
            Single filter net ns/op: 9.09931
            Batched, prepared net ns/op: 34.21
            Batched, unprepared net ns/op: 88.8564
            Skewed 50% in 1% net ns/op: 139.75
            Skewed 80% in 20% net ns/op: 181.264
            Random filter net ns/op: 173.88
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26378710
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 058428967c55ed763698284cd3b4bbe3351b6e69
      e4f1e64c
    • D
      db_bench: dump cpu info for Mac. (#7932) · 14fbb43f
      David CARLIER 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7932
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26316480
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 3e002e49fcb7f60bc9270550a6b3e182fe197551
      14fbb43f
    • X
      Build a full RocksDB on M1 macs (#7943) · 7ebde3da
      Xavier Deguillard 提交于
      Summary:
      With M1 macs being available, it is possible that RocksDB will be built on them, without the resulting artifacts to be intended for iOS, where a non-lite RocksDB is needed.
      
      It is not clear to me why the ROCKSDB_LITE cmake option isn't used for iOS consumer, so sending this pull request as a way to foster discussion and to find a path forward to get a full RocksDB build on M1.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7943
      
      Test Plan:
      Applied the following patch:
      ```
       diff --git a/fbcode/opensource/fbcode_builder/manifests/rocksdb b/fbcode/opensource/fbcode_builder/manifests/rocksdb
       --- a/fbcode/opensource/fbcode_builder/manifests/rocksdb
      +++ b/fbcode/opensource/fbcode_builder/manifests/rocksdb
      @@ -2,8 +2,8 @@
       name = rocksdb
      
       [download]
      -url = https://github.com/facebook/rocksdb/archive/v6.8.1.tar.gz
      -sha256 = ca192a06ed3bcb9f09060add7e9d0daee1ae7a8705a3d5ecbe41867c5e2796a2
      +url = https://github.com/xavierd/rocksdb/archive/master.zip
      +sha256 = f93f3f92df66a8401659e35398749d5910b92bd9c14b8354a35ea8852865c422
      
       [dependencies]
       lz4
      @@ -11,7 +11,7 @@
      
       [build]
       builder = cmake
      -subdir = rocksdb-6.8.1
      +subdir = rocksdb-master
      
       [cmake.defines]
       WITH_SNAPPY=ON
      ```
      
      And ran `getdeps build eden` on an M1 macbook. The build used to fail at link time due to some RocksDB symbols not being found, it now fails for another reason (x86_64 Rust symbols).
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26324049
      
      Pulled By: xavierd
      
      fbshipit-source-id: 12d86f3395709c4c323f440844e3ae65672aef2d
      7ebde3da
    • Y
      Use actual url instead of tinyurl.com (#7950) · 170dffac
      Yanqin Jin 提交于
      Summary:
      Due to offline discussion, we use actual url of the clang-format-diff.py and add a note.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7950
      
      Reviewed By: pdillinger
      
      Differential Revision: D26370822
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7508e23c002d56d5c1649090438ef5f8ff2cdbe7
      170dffac
    • A
      Makefile support to statically link external plugin code (#7918) · c16d5a4f
      Andrew Kryczka 提交于
      Summary:
      Added support for detecting plugins linked in the "plugin/" directory and building them from our Makefile in a standardized way. See "plugin/README.md" for details. An example of a plugin that can be built in this way can be found in https://github.com/ajkr/dedupfs.
      
      There will be more to do in terms of making this process more convenient and adding support for CMake.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7918
      
      Test Plan: my own plugin (https://github.com/ajkr/dedupfs) and also heard this patch worked with ZenFS.
      
      Reviewed By: pdillinger
      
      Differential Revision: D26189969
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6624d4357d0ffbaedb42f0d12a3fcb737c78f758
      c16d5a4f
  2. 10 2月, 2021 2 次提交
  3. 09 2月, 2021 2 次提交
  4. 07 2月, 2021 1 次提交
  5. 06 2月, 2021 5 次提交
  6. 05 2月, 2021 1 次提交
  7. 04 2月, 2021 1 次提交
  8. 03 2月, 2021 2 次提交
    • L
      Add the integrated BlobDB to the stress/crash tests (#7900) · 0288bdbc
      Levi Tamasi 提交于
      Summary:
      The patch adds support for the options related to the new BlobDB implementation
      to `db_stress`, including support for dynamically adjusting them using `SetOptions`
      when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically`
      are specified. (The latter is used to prevent the options from being enabled when
      incompatible features are in use.)
      
      The patch also updates the `db_stress` help messages of the existing stacked BlobDB
      related options to clarify that they pertain to the old implementation. In addition, it
      adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion
      of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN),
      and to also test BlobDB in conjunction with atomic flush and transactions, the script sets
      the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900
      
      Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26094913
      
      Pulled By: ltamasi
      
      fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814
      0288bdbc
    • Z
      Return Status::OK for unimplemented write batch handler in trace analyzer (#7910) · 108e6b63
      Zhichao Cao 提交于
      Summary:
      The unimplemented handler will return Status::InvalidArgument() and caused issues when using trace analyzer for write batch record. Override with returning Status::OK()
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7910
      
      Test Plan: tested with real trace, make check
      
      Reviewed By: siying
      
      Differential Revision: D26154327
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: bcdefd4891f839b2e89e4c079f9f430245f482fb
      108e6b63
  9. 02 2月, 2021 3 次提交
  10. 30 1月, 2021 2 次提交
    • L
      Fix a SingleDelete related optimization for blob indexes (#7904) · e5311a8e
      Levi Tamasi 提交于
      Summary:
      There is a small `SingleDelete` related optimization in the
      `CompactionIterator` code: when a `SingleDelete`-`Put` pair is preserved
      solely for the purposes of transaction conflict checking, the value
      itself gets cleared. (This is referred to as "optimization 3" in the
      `CompactionIterator` code.) Though the rest of the code got updated to
      support `SingleDelete`'ing blob indexes, this chunk was apparently
      missed, resulting in an assertion failure (or `ROCKS_LOG_FATAL` in release
      builds) when triggered. Note: in addition to clearing the value, we also
      need to update the type of the KV to regular value when dealing with
      blob indexes here.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7904
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D26118009
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 6bf78043d20265e2b15c2e1ab8865025040c42ae
      e5311a8e
    • A
      Integrity protection for live updates to WriteBatch (#7748) · 78ee8564
      Andrew Kryczka 提交于
      Summary:
      This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
      
      The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
      
      When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
      
      Test Plan:
      - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
      - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
      - [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
      
      Reviewed By: pdillinger
      
      Differential Revision: D25754492
      
      Pulled By: ajkr
      
      fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
      78ee8564
  11. 29 1月, 2021 2 次提交
    • M
      Remove Legacy and Custom FileWrapper classes from header files (#7851) · 4a09d632
      mrambacher 提交于
      Summary:
      Removed the uses of the Legacy FileWrapper classes from the source code.  The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.
      
      Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851
      
      Reviewed By: anand1976
      
      Differential Revision: D26114816
      
      Pulled By: mrambacher
      
      fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
      4a09d632
    • M
      Make builds reproducible (#7866) · 0a9a05ae
      mrambacher 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/7035
      
      Changed how build_version.cc was generated:
      - Included the GIT tag/branch in the build_version file
      - Changed the "Build Date" to be:
            - If the GIT branch is "clean" (no changes), the date of the last git commit
            - If the branch is not clean, the current date
       - Added APIs to access the "build information", rather than accessing the strings directly.
      
      The build_version.cc file is now regenerated whenever the library objects are rebuilt.
      
      Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7866
      
      Reviewed By: pdillinger
      
      Differential Revision: D26086565
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f
      0a9a05ae
  12. 28 1月, 2021 4 次提交
  13. 27 1月, 2021 5 次提交
  14. 26 1月, 2021 3 次提交
    • M
      Add a SystemClock class to capture the time functions of an Env (#7858) · 12f11373
      mrambacher 提交于
      Summary:
      Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
      
      Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
      
      There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
      
      Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
      
      Reviewed By: pdillinger
      
      Differential Revision: D26006406
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
      12f11373
    • A
      In IOTracing, add filename with each operation in trace file. (#7885) · 1d226018
      Akanksha Mahajan 提交于
      Summary:
      1. In IOTracing, add filename with each IOTrace record. Filename is stored in file object (Tracing Wrappers).
               2. Change the logic of figuring out which additional information (file_size,
                  length, offset etc) needs to be store with each operation
                  which is different for different operations.
                  When new information will be added in future (depends on operation),
                  this change would make the future additions simple.
      
      Logic: In IOTraceRecord, io_op_data is added and its
               bitwise positions represent which additional information need
               to added in the record from enum IOTraceOp. Values in IOTraceOp represent bitwise positions.
               So if length and offset needs to be stored (IOTraceOp::kIOLen
               is 1 and IOTraceOp::kIOOffset is 2), position 1 and 2 (from rightmost bit) will be set
               and io_op_data will contain 110.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7885
      
      Test Plan: Updated io_tracer_test and verified the trace file manually.
      
      Reviewed By: anand1976
      
      Differential Revision: D25982353
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: ebfc5539cc0e231d7794a6b42b73f5403e360b22
      1d226018
    • L
      Do not explicitly flush blob files when using the integrated BlobDB (#7892) · 431e8afb
      Levi Tamasi 提交于
      Summary:
      In the original stacked BlobDB implementation, which writes blobs to blob files
      immediately and treats blob files as logs, it makes sense to flush the file after
      writing each blob to protect against process crashes; however, in the integrated
      implementation, which builds blob files in the background jobs, this unnecessarily
      reduces performance. This patch fixes this by simply adding a `do_flush` flag to
      `BlobLogWriter`, which is set to `true` by the stacked implementation and to `false`
      by the new code. Note: the change itself is trivial but the tests needed some work;
      since in the new implementation, blobs are now buffered, adding a blob to
      `BlobFileBuilder` is no longer guaranteed to result in an actual I/O. Therefore, we can
      no longer rely on `FaultInjectionTestEnv` when testing failure cases; instead, we
      manipulate the return values of I/O methods directly using `SyncPoint`s.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7892
      
      Test Plan: `make check`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26022814
      
      Pulled By: ltamasi
      
      fbshipit-source-id: b3dce419f312137fa70d84cdd9b908fd5d60d8cd
      431e8afb
  15. 22 1月, 2021 2 次提交
    • L
      Update HISTORY.md for PR 7888 (#7890) · 19076c95
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7890
      
      Reviewed By: ajkr
      
      Differential Revision: D26005509
      
      Pulled By: ltamasi
      
      fbshipit-source-id: e7eb732180d447900788d0e3a17dfd1c3f1e708a
      19076c95
    • M
      MergeHelper::FilterMerge() calling ElapsedNanosSafe() upon exit even … (#7867) · 12a8be1d
      Matthew Von-Maszewski 提交于
      Summary:
      …when unused.  Causes many calls to clock_gettime, impacting performance.
      
      Was looking for something else via Linux "perf" command when I spotted heavy usage of clock_gettime during a compaction.  Our product heavily uses the rocksdb::Options::merge_operator.  MergeHelper::FilterMerge() properly tests if timing is enabled/disabled upon entry, but not on exit.  This patch fixes the exit.
      
      Note:  the entry test also verifies if "nullptr!=stats_".  This test is redundant to code within ShouldReportDetailedTime().  Therefore I omitted it in my change.
      
      merge_test.cc updated with test that shows failure before merge_helper.cc change ... and fix after change.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7867
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D25960175
      
      Pulled By: ajkr
      
      fbshipit-source-id: 56e66d7eb6ae5eae89c8e0d5a262bd2905a226b6
      12a8be1d