1. 30 1月, 2021 1 次提交
    • A
      Integrity protection for live updates to WriteBatch (#7748) · 78ee8564
      Andrew Kryczka 提交于
      Summary:
      This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
      
      The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
      
      When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
      
      Test Plan:
      - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
      - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
      - [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
      
      Reviewed By: pdillinger
      
      Differential Revision: D25754492
      
      Pulled By: ajkr
      
      fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
      78ee8564
  2. 29 1月, 2021 1 次提交
    • M
      Make builds reproducible (#7866) · 0a9a05ae
      mrambacher 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/7035
      
      Changed how build_version.cc was generated:
      - Included the GIT tag/branch in the build_version file
      - Changed the "Build Date" to be:
            - If the GIT branch is "clean" (no changes), the date of the last git commit
            - If the branch is not clean, the current date
       - Added APIs to access the "build information", rather than accessing the strings directly.
      
      The build_version.cc file is now regenerated whenever the library objects are rebuilt.
      
      Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7866
      
      Reviewed By: pdillinger
      
      Differential Revision: D26086565
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f
      0a9a05ae
  3. 26 1月, 2021 1 次提交
    • M
      Add a SystemClock class to capture the time functions of an Env (#7858) · 12f11373
      mrambacher 提交于
      Summary:
      Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
      
      Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
      
      There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
      
      Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
      
      Reviewed By: pdillinger
      
      Differential Revision: D26006406
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
      12f11373
  4. 22 1月, 2021 1 次提交
  5. 21 1月, 2021 1 次提交
  6. 23 12月, 2020 1 次提交
    • S
      Range Locking: Implementation of range locking (#7506) · daab7603
      Sergei Petrunia 提交于
      Summary:
      Range Locking - an implementation based on the locktree library
      
      - Add a RangeTreeLockManager and RangeTreeLockTracker which implement
        range locking using the locktree library.
      - Point locks are handled as locks on single-point ranges.
      - Add a unit test: range_locking_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7506
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D25320703
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: f86347384b42ba2b0257d67eca0f45f806b69da7
      daab7603
  7. 22 12月, 2020 1 次提交
  8. 10 12月, 2020 1 次提交
  9. 12 11月, 2020 1 次提交
    • M
      Create a Customizable class to load classes and configurations (#6590) · c442f680
      mrambacher 提交于
      Summary:
      The Customizable class is an extension of the Configurable class and allows instances to be created by a name/ID.  Classes that extend customizable can define their Type (e.g. "TableFactory", "Cache") and  a method to instantiate them (TableFactory::CreateFromString).  Customizable objects can be registered with the ObjectRegistry and created dynamically.
      
      Future PRs will make more types of objects extend Customizable.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6590
      
      Reviewed By: cheng-chang
      
      Differential Revision: D24841553
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: d0c2132bd932e971cbfe2c908ca2e5db30c5e155
      c442f680
  10. 26 10月, 2020 1 次提交
    • P
      Ribbon: initial (general) algorithms and basic unit test (#7491) · 25d54c79
      Peter Dillinger 提交于
      Summary:
      This is intended as the first commit toward a near-optimal alternative to static Bloom filters for SSTs. Stephan Walzer and I have agreed upon the name "Ribbon" for a PHSF based on his linear system construction in "Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications" ("SGauss") and my much faster "on the fly" algorithm for gaussian elimination (or for this linear system, "banding"), which can be faster than peeling while also more compact and flexible. See util/ribbon_alg.h for more detailed introduction and background. RIBBON = Rapid Incremental Boolean Banding ON-the-fly
      
      This commit just adds generic (templatized) core algorithms and a basic unit test showing some features, including the ability to construct structures within 2.5% space overhead vs. information theoretic lower bound. (Compare to cache-local Bloom filter's ~50% space overhead -> ~30% reduction anticipated.) This commit does not include the storage scheme necessary to make queries fast, especially for filter queries, nor fractional "result bits", but there is some description already and those implementations will come soon. Nor does this commit add FilterPolicy support, for use in SST files, but that will also come soon.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7491
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24517954
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 0119ee597e250d7e0edd38ada2ba50d755606fa7
      25d54c79
  11. 20 10月, 2020 1 次提交
    • C
      Abstract out LockManager interface (#7532) · 0ea7db76
      Cheng Chang 提交于
      Summary:
      In order to be able to introduce more locking protocols, we need to abstract out the locking subsystem in TransactionDB into a set of interfaces.
      
      PR https://github.com/facebook/rocksdb/pull/7013 introduces interface `LockTracker`. This PR is a follow up to take the first step to abstract out a `LockManager` interface.
      
      Further modifications to the interface may be needed when introducing the first implementation of range lock. But the idea here is to put the range lock implementation based on range tree under the `utilities/transactions/lock/range/range_tree`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7532
      
      Test Plan: point_lock_manager_test
      
      Reviewed By: ajkr
      
      Differential Revision: D24238731
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 2a9458cd8b3fb008d9529dbc4d3b28c24631f463
      0ea7db76
  12. 16 10月, 2020 1 次提交
    • L
      Introduce BlobFileCache and add support for blob files to Get() (#7540) · e8cb32ed
      Levi Tamasi 提交于
      Summary:
      The patch adds blob file support to the `Get` API by extending `Version` so that
      whenever a blob reference is read from a file, the blob is retrieved from the corresponding
      blob file and passed back to the caller. (This is assuming the blob reference is valid
      and the blob file is actually part of the given `Version`.) It also introduces a cache
      of `BlobFileReader`s called `BlobFileCache` that enables sharing `BlobFileReader`s
      between callers. `BlobFileCache` uses the same backing cache as `TableCache`, so
      `max_open_files` (if specified) limits the total number of open (table + blob) files.
      
      TODO: proactively open/cache blob files and pin the cache handles of the readers in the
      metadata objects similarly to what `VersionBuilder::LoadTableHandlers` does for
      table files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7540
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D24260219
      
      Pulled By: ltamasi
      
      fbshipit-source-id: a8a2a4f11d3d04d6082201b52184bc4d7b0857ba
      e8cb32ed
  13. 08 10月, 2020 2 次提交
    • L
      Clean up BlobLogReader and rename it to BlobLogSequentialReader (#7517) · 1f84611e
      Levi Tamasi 提交于
      Summary:
      The patch does some cleanup in and around the legacy `BlobLogReader` class:
      * It renames the class to `BlobLogSequentialReader` to emphasize that it is for
      sequentially iterating through blobs in a blob file, as opposed to doing random
      point reads using `BlobIndex`es (which is `BlobFileReader`'s jurisdiction).
      * It removes some dead code from the old BlobDB implementation that references
      `BlobLogReader` (namely the method `BlobFile::OpenRandomAccessReader`).
      * It cleans up some `#include`s and forward declarations.
      * It fixes some incorrect/outdated comments related to the reader class.
      * It adds a few assertions to the `Read` methods of the class.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7517
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D24172611
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 43e2ae1eba5c3dd30c1070cb00f217edc45bd64f
      1f84611e
    • L
      Introduce a blob file reader class (#7461) · 22655a39
      Levi Tamasi 提交于
      Summary:
      The patch adds a class called `BlobFileReader` that can be used to retrieve blobs
      using the information available in blob references (e.g. blob file number, offset, and
      size). This will come in handy when implementing blob support for `Get`, `MultiGet`,
      and iterators, and also for compaction/garbage collection.
      
      When a `BlobFileReader` object is created (using the factory method `Create`),
      it first checks whether the specified file is potentially valid by comparing the file
      size against the combined size of the blob file header and footer (files smaller than
      the threshold are considered malformed). Then, it opens the file, and reads and verifies
      the header and footer. The verification involves magic number/CRC checks
      as well as checking for unexpected header/footer fields, e.g. incorrect column family ID
      or TTL blob files.
      
      Blobs can be retrieved using `GetBlob`. `GetBlob` validates the offset and compression
      type passed by the caller (because of the presence of the header and footer, the
      specified offset cannot be too close to the start/end of the file; also, the compression type
      has to match the one in the blob file header), and retrieves and potentially verifies and
      uncompresses the blob. In particular, when `ReadOptions::verify_checksums` is set,
      `BlobFileReader` reads the blob record header as well (as opposed to just the blob itself)
      and verifies the key/value size, the key itself, as well as the CRC of the blob record header
      and the key/value pair.
      
      In addition, the patch exposes the compression type from `BlobIndex` (both using an
      accessor and via `DebugString`), and adds a blob file read latency histogram to
      `InternalStats` that can be used with `BlobFileReader`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7461
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23999219
      
      Pulled By: ltamasi
      
      fbshipit-source-id: deb6b1160d251258b308d5156e2ec063c3e12e5e
      22655a39
  14. 02 10月, 2020 2 次提交
    • A
      Periodically flush info log out of application buffer (#7488) · 1e009097
      Andrew Kryczka 提交于
      Summary:
      This PR schedules a background thread (shared across all DB instances)
      to flush info log every ten seconds. This improves debuggability in case
      of RocksDB hanging since it ensures the log messages leading up to the hang
      will eventually become visible in the log.
      
      The bulk of this PR is moving monitoring/stats_dump_scheduler* to db/periodic_work_scheduler*
      and making the corresponding name changes since now the scheduler handles info
      log flushing, not just stats dumping.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7488
      
      Reviewed By: riversand963
      
      Differential Revision: D24065165
      
      Pulled By: ajkr
      
      fbshipit-source-id: 339c47a0ff43b79fdbd055fbd9fefbb6f9d8d3b5
      1e009097
    • S
      Introduce options.check_flush_compaction_key_order (#7467) · 75081755
      sdong 提交于
      Summary:
      Introduce an new option options.check_flush_compaction_key_order, by default set to true, which checks key order of flush and compaction, and fail the operation if the order is violated.
      Also did minor refactor hash checking code, which consolidates the hashing logic to a vlidation class, where the key ordering logic is added.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7467
      
      Test Plan: Add unit tests to validate the check can catch reordering in flush and compaction, and can be properly disabled.
      
      Reviewed By: riversand963
      
      Differential Revision: D24010683
      
      fbshipit-source-id: 8dd6292d2cda8006054e9ded7cfa4bf405f0527c
      75081755
  15. 26 9月, 2020 1 次提交
  16. 24 9月, 2020 2 次提交
    • A
      Add IO Tracer Parser (#7333) · 98ac6b64
      Akanksha Mahajan 提交于
      Summary:
      Implement a parsing tool io_tracer_parser that takes IO trace file (binary file) with command line argument --io_trace_file and output file with --output_file and dumps the IO trace records in outputfile in human readable form.
      
      Also added unit test cases that generates IO trace records and calls io_tracer_parse to parse those records.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7333
      
      Test Plan:
      make check -j64,
       Add unit test cases.
      
      Reviewed By: anand1976
      
      Differential Revision: D23772360
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 9c20519c189362e6663352d08863326f3e496271
      98ac6b64
    • P
      Fix/minimize mock_time_env.h dependencies (#7426) · ac1734d0
      Peter Dillinger 提交于
      Summary:
      (a) own copy of kMicrosInSecond
      (b) out-of-line sync point code
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7426
      
      Test Plan: FB internal
      
      Reviewed By: ajkr
      
      Differential Revision: D23861363
      
      Pulled By: pdillinger
      
      fbshipit-source-id: de6b1621dca2f7391c5ff72bad04a7613dc27527
      ac1734d0
  17. 15 9月, 2020 1 次提交
  18. 10 9月, 2020 1 次提交
    • J
      tests need linked with third_party libs (#7351) · f1e99b36
      Jay Zhuang 提交于
      Summary:
      To fix the cmake build with third_party libs, like:
      `mkdir build && cd build && cmake .. -DWITH_SNAPPY=1 && make`
      
      Error:
      ```
      Undefined symbols for architecture x86_64:
        "snappy::RawCompress(char const*, unsigned long, char*, unsigned long*)"
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7351
      
      Reviewed By: pdillinger
      
      Differential Revision: D23553705
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 19b45c6763c7256107583e8af4c01d370ca06128
      f1e99b36
  19. 03 9月, 2020 1 次提交
  20. 28 8月, 2020 2 次提交
    • J
      Add buffer prefetch support for non directIO usecase (#7312) · c2485f2d
      Jay Zhuang 提交于
      Summary:
      A new file interface `SupportPrefetch()` is added. When the user overrides it to `false`, an internal prefetch buffer will be used for readahead. Useful for non-directIO but FS doesn't have readahead support.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7312
      
      Reviewed By: anand1976
      
      Differential Revision: D23329847
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 71cd4ce6f4a820840294e4e6aec111ab76175527
      c2485f2d
    • L
      Add a blob file builder class that can be used in background jobs (#7306) · 50439606
      Levi Tamasi 提交于
      Summary:
      The patch adds a class called `BlobFileBuilder` that can be used to build
      and cut blob files in background jobs (flushes/compactions). The class
      enforces a value size threshold (`min_blob_size`; smaller blobs will be inlined
      in the LSM tree itself), and supports specifying a blob file size limit (`blob_file_size`),
      as well as compression (`blob_compression_type`) and checksums for blob files.
      It also keeps track of the generated blob files and their associated `BlobFileAddition`
      metadata, which can be applied as part of the background job's `VersionEdit`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7306
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23298817
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 38f35d81dab1ba81f15236240612ec173d7f21b5
      50439606
  21. 18 8月, 2020 1 次提交
    • J
      db_bench should be linked with thirdparty libs (#7264) · c073b7fa
      Jay Zhuang 提交于
      Summary:
      `db_bench` is not linked with thirdparty libs in cmake, even `-DWITH_*`
      is specified.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7264
      
      Test Plan:
      `$ mkdir build; cd build; cmake .. -DWITH_SNAPPY=1; make db_bench; ./db_bench`
      `$ cmake .. -DWITH_SNAPPY=1 -DWITH_LZ4; make db_bench; ./db_bench -compression_type=lz4`
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D23165077
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9c6fead31c41664a5c75ecd6469f47402fcb7d62
      c073b7fa
  22. 15 8月, 2020 1 次提交
    • J
      Introduce a global StatsDumpScheduler for stats dumping (#7223) · 69760b4d
      Jay Zhuang 提交于
      Summary:
      Have a global StatsDumpScheduler for all DB instance stats dumping, including `DumpStats()` and `PersistStats()`. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7223
      
      Reviewed By: riversand963
      
      Differential Revision: D23056737
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0faa2311142a73433ebb3317361db7cbf43faeba
      69760b4d
  23. 13 8月, 2020 1 次提交
    • A
      Store FileSystemPtr object that contains FileSystem ptr (#7180) · 1f9f630b
      Akanksha Mahajan 提交于
      Summary:
      As part of the IOTracing project, this PR
          1. Caches "FileSystemPtr" object(wrapper class that returns file system pointer based on tracing enabled) instead of "FileSystem" pointer.
          2. FileSystemPtr object is created using FileSystem pointer and IOTracer
          pointer.
          3. IOTracer shared_ptr is created in DBImpl and it is passed to different classes through constructor.
          4. When tracing is enabled through DB::StartIOTrace, FileSystemPtr
          returns FileSystemTracingWrapper pointer for tracing purpose and when
          it is disabled underlying FileSystem pointer is returned.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7180
      
      Test Plan:
      make check -j64
                      COMPILE_WITH_TSAN=1 make check -j64
      
      Reviewed By: anand1976
      
      Differential Revision: D22987117
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 6073617e4c2d5bc363914f3a1f55ae3b0a58fbf1
      1f9f630b
  24. 11 8月, 2020 1 次提交
    • Y
      Fix cmake build on MacOS (#7205) · 5444942f
      Yuhong Guo 提交于
      Summary:
      1. `std::random_shuffle` is deprecated and now we can use `std::shuffle`
      ```
      /rocksdb/db/prefix_test.cc:590:12: error: 'random_shuffle<std::__1::__wrap_iter<unsigned long long *> >'
            is deprecated [-Werror,-Wdeprecated-declarations]
            std::random_shuffle(prefixes.begin(), prefixes.end());
                 ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/algorithm:2982:1: note:
            'random_shuffle<std::__1::__wrap_iter<unsigned long long *> >' has been explicitly marked deprecated here
      _LIBCPP_DEPRECATED_IN_CXX14 void
      ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1107:39: note: expanded from macro
            '_LIBCPP_DEPRECATED_IN_CXX14'
      #  define _LIBCPP_DEPRECATED_IN_CXX14 _LIBCPP_DEPRECATED
                                            ^
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1090:48: note: expanded from macro
            '_LIBCPP_DEPRECATED'
      #    define _LIBCPP_DEPRECATED __attribute__ ((deprecated))
      ```
      2. `c_test` link error with `-DROCKSDB_BUILD_SHARED=OFF`:
      ```
      [  7%] Linking CXX executable c_test
      ld: library not found for -lrocksdb-shared
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
      make[5]: *** [c_test] Error 1
      make[4]: *** [CMakeFiles/c_test.dir/all] Error 2
      make[4]: *** Waiting for unfinished jobs....
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7205
      
      Reviewed By: ajkr
      
      Differential Revision: D23030641
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f270e50fc0b824ca1a0876ec5c65d33f55a72dd0
      5444942f
  25. 07 8月, 2020 1 次提交
    • C
      Replace tracked_keys with a new LockTracker interface in TransactionDB (#7013) · 71c7e493
      Cheng Chang 提交于
      Summary:
      We're going to support more locking protocols such as range lock in transaction.
      
      However, in current design, `TransactionBase` has a member `tracked_keys` which assumes that point lock (lock a single key) is used, and is used in snapshot checking (isolation protocol). When using range lock, we may use read committed instead of snapshot checking as the isolation protocol.
      
      The most significant usage scenarios of `tracked_keys` are:
      1. pessimistic transaction uses it to track the locked keys, and unlock these keys when commit or rollback.
      2. optimistic transaction does not lock keys upfront, it only tracks the lock intentions in tracked_keys, and do write conflict checking when commit.
      3. each `SavePoint` tracks the keys that are locked since the `SavePoint`, `RollbackToSavePoint` or `PopSavePoint` relies on both the tracked keys in `SavePoint`s and `tracked_keys`.
      
      Based on these scenarios, if we can abstract out a `LockTracker` interface to hold a set of tracked locks (can be keys or key ranges), and have methods that can be composed together to implement the scenarios, then `tracked_keys` can be an internal data structure of one implementation of `LockTracker`. See `utilities/transactions/lock/lock_tracker.h` for the detailed interface design, and `utilities/transactions/lock/point_lock_tracker.cc` for the implementation.
      
      In the future, a `RangeLockTracker` can be implemented to track range locks without affecting other components.
      
      After this PR, a clean interface for lock manager should be possible, and then ideally, we can have pluggable locking protocols.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7013
      
      Test Plan: Run `transaction_test` and `optimistic_transaction_test`.
      
      Reviewed By: ajkr
      
      Differential Revision: D22163706
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: f2860577b5334e31dd2994f5bc6d7c40d502b1b4
      71c7e493
  26. 06 8月, 2020 1 次提交
    • C
      Define WAL related classes to be used in VersionEdit and VersionSet (#7164) · cd48ecaa
      Cheng Chang 提交于
      Summary:
      `WalAddition`, `WalDeletion` are defined in `wal_version.h` and used in `VersionEdit`.
      `WalAddition` is used to represent events of creating a new WAL (no size, just log number), or closing a WAL (with size).
      `WalDeletion` is used to represent events of deleting or archiving a WAL, it means the WAL is no longer alive (won't be replayed during recovery).
      
      `WalSet` is the set of alive WALs kept in `VersionSet`.
      
      1. Why use `WalDeletion` instead of relying on `MinLogNumber` to identify outdated WALs
      
      On recovery, we can compute `MinLogNumber()` based on the log numbers kept in MANIFEST, any log with number < MinLogNumber can be ignored. So it seems that we don't need to persist `WalDeletion` to MANIFEST, since we can ignore the WALs based on MinLogNumber.
      
      But the `MinLogNumber()` is actually a lower bound, it does not exactly mean that logs starting from MinLogNumber must exist. This is because in a corner case, when a column family is empty and never flushed, its log number is set to the largest log number, but not persisted in MANIFEST. So let's say there are 2 column families, when creating the DB, the first WAL has log number 1, so it's persisted to MANIFEST for both column families. Then CF 0 is empty and never flushed, CF 1 is updated and flushed, so a new WAL with log number 2 is created and persisted to MANIFEST for CF 1. But CF 0's log number in MANIFEST is still 1. So on recovery, MinLogNumber is 1, but since log 1 only contains data for CF 1, and CF 1 is flushed, log 1 might have already been deleted from disk.
      
      We can make `MinLogNumber()` be the exactly minimum log number that must exist, by persisting the most recent log number for empty column families that are not flushed. But if there are N such column families, then every time a new WAL is created, we need to add N records to MANIFEST.
      
      In current design, a record is persisted to MANIFEST only when WAL is created, closed, or deleted/archived, so the number of WAL related records are bounded to 3x number of WALs.
      
      2. Why keep `WalSet` in `VersionSet` instead of applying the `VersionEdit`s to `VersionStorageInfo`
      
      `VersionEdit`s are originally designed to track the addition and deletion of SST files. The SST files are related to column families, each column family has a list of `Version`s, and each `Version` keeps the set of active SST files in `VersionStorageInfo`.
      
      But WALs are a concept of DB, they are not bounded to specific column families. So logically it does not make sense to store WALs in a column family's `Version`s.
      Also, `Version`'s purpose is to keep reference to SST / blob files, so that they are not deleted until there is no version referencing them. But a WAL is deleted regardless of version references.
      So we keep the WALs in `VersionSet`  for the purpose of writing out the DB state's snapshot when creating new MANIFESTs.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7164
      
      Test Plan:
      make version_edit_test && ./version_edit_test
      make wal_edit_test && ./wal_edit_test
      
      Reviewed By: ltamasi
      
      Differential Revision: D22677936
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 5a3b6890140e572ffd79eb37e6e4c3c32361a859
      cd48ecaa
  27. 05 8月, 2020 1 次提交
    • A
      Add support to start and end IOTracing through DB APIs (#7203) · 493f425e
      Akanksha Mahajan 提交于
      Summary:
      1. Add support to start io tracing through DB::StartIOTrace(Env*, const TraceOptions&, std::unique_ptr<TraceWriter>&&) and end tracing through DB::EndIOTrace(). This doesn't trace DB::Open.
      
      User side code:
      
      //Open DB
      DB::Open(options, dbname, &db);
      
      /* Start tracing */
      db->StartIOTrace(env, trace_opt, std::move(trace_writer));
      
      /* Perform Operations */
      
      /*End tracing*/
      db->EndIOTrace();
      
      2. Fix the build errors for Windows.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7203
      
      Test Plan: make check -j64
      
      Reviewed By: anand1976
      
      Differential Revision: D22901947
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e59c0b785a802168e6f1aa028d99c224a35cb30c
      493f425e
  28. 25 7月, 2020 1 次提交
    • T
      SST Partitioner interface that allows to split SST files (#6957) · cd4592c2
      Tomas Kolda 提交于
      Summary:
      SST Partitioner interface that allows to split SST files during compactions.
      
      It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6957
      
      Reviewed By: ajkr
      
      Differential Revision: D22461239
      
      fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e
      cd4592c2
  29. 10 7月, 2020 1 次提交
    • M
      More Makefile Cleanup (#7097) · c7c7b07f
      mrambacher 提交于
      Summary:
      Cleans up some of the dependencies on test code in the Makefile while building tools:
      - Moves the test::RandomString, DBBaseTest::RandomString into Random
      - Moves the test::RandomHumanReadableString into Random
      - Moves the DestroyDir method into file_utils
      - Moves the SetupSyncPointsToMockDirectIO into sync_point.
      - Moves the FaultInjection Env and FS classes under env
      
      These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies.  By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.
      
      Tested both release and debug builds via Make and CMake for both static and shared libraries.
      
      More work remains to clean up how the tools are built and remove some unnecessary dependencies.  There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097
      
      Reviewed By: riversand963
      
      Differential Revision: D22463160
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
      c7c7b07f
  30. 09 7月, 2020 1 次提交
  31. 26 6月, 2020 2 次提交
    • D
      freebsd: malloc_usable_size check malloc_np.h (#7009) · ce332f8c
      Daniel Black 提交于
      Summary:
      Per https://www.unix.com/man-page/freebsd/3/malloc_usable_size/
      malloc_usable_size is in malloc_np.h as its a non-standard API.
      
      Without patch it just fails to detect from ./CMakeFiles/CMakeError.log
      
      In file included from /home/dan/build-rocksdb/CMakeFiles/CMakeTmp/CheckSymbolExists.cxx:2:
      /usr/include/malloc.h:3:2: error: "<malloc.h> has been replaced by <stdlib.h>"
       ^
      /home/dan/build-rocksdb/CMakeFiles/CMakeTmp/CheckSymbolExists.cxx:8:19: error: use of undeclared identifier 'malloc_usable_size'
        return ((int*)(&malloc_usable_size))[argc];
                        ^
      2 errors generated.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7009
      
      Reviewed By: riversand963
      
      Differential Revision: D22176093
      
      Pulled By: ajkr
      
      fbshipit-source-id: da980f3d343b6d9b0c70d7827c6df495f3fb1ade
      ce332f8c
    • D
      gflags: freebsd include path + links (#7011) · 741b9ba9
      Daniel Black 提交于
      Summary:
      The include path from find_package(gflags) needed to be included to
      compile.
      
      Because gflags got included in THIRDPARTY_LIBS as a PRIVATE library
      to ROCKSDB_{SHARED|STATIC}_LIB, its functions aren't accessible to
      the all the tools an utilities that use gflags directly.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7011
      
      Reviewed By: siying
      
      Differential Revision: D22176303
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0a94523fc69e82d8f686bc0b43dc3eafc51ad84f
      741b9ba9
  32. 25 6月, 2020 1 次提交
    • Z
      Add a new option for BackupEngine to store table files under shared_checksum... · be41c61f
      Zitan Chen 提交于
      Add a new option for BackupEngine to store table files under shared_checksum using DB session id in the backup filenames (#6997)
      
      Summary:
      `BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`.
      
      Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.
      
      Three new test cases are added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997
      
      Test Plan: Passed make check.
      
      Reviewed By: ajkr
      
      Differential Revision: D22098895
      
      Pulled By: gg814
      
      fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76
      be41c61f
  33. 19 6月, 2020 2 次提交
  34. 13 6月, 2020 1 次提交
    • S
      Reduce test coverage in older VS versions (#6966) · 7e2ac0c3
      sdong 提交于
      Summary:
      With Appveyor we run the same set of tests for older versions of VS as the latest version. It creates extra hanging which we don't plan to investigate. Instead, minimize tests run there. The full tests on Windows are already covered in CircleCI.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6966
      
      Test Plan: Watch appveyor runs.
      
      Reviewed By: pdillinger
      
      Differential Revision: D22025383
      
      fbshipit-source-id: 079dff9e8213bc750a47f4add90fdbf18de9d737
      7e2ac0c3