1. 13 11月, 2020 3 次提交
    • Y
      Add full_history_ts_low_ to CompactionJob (#7657) · cf9d8e45
      Yanqin Jin 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/7556 enables `CompactionIterator` to perform garbage collection during compaction according
      to a lower bound (user-defined) timestamp `full_history_ts_low_`.
      
      This PR adds a data member `full_history_ts_low_` of type `std::string` to `CompactionJob`, and
      `full_history_ts_low_` does not change during compaction. `CompactionJob` will pass a pointer to this
      data member to the `CompactionIterator` used during compaction.
      
      Also refactored compaction_job_test.cc to re-use some existing code, which is actually the majority of this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7657
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D24913803
      
      Pulled By: riversand963
      
      fbshipit-source-id: 11ad5329ddac365667152e7b3b02f84182c0ca8e
      cf9d8e45
    • L
      Clean up CompactionProxy (#7662) · 0dc437d6
      Levi Tamasi 提交于
      Summary:
      `CompactionProxy` is currently both a concrete class used for actual `Compaction`s
      and a base class that `FakeCompaction` (which is used in `compaction_iterator_test`)
      is derived from. This is bad from an OO design standpoint, and also results in
      `FakeCompaction` containing an (uninitialized and unused) `Compaction*` member.
      The patch fixes this by making `CompactionProxy` a pure interface and introducing
      a separate concrete class `RealCompaction` for non-test/non-fake compactions. It
      also removes an unused parameter from the virtual method `level`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7662
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D24907680
      
      Pulled By: ltamasi
      
      fbshipit-source-id: c100ecb1beef4b0ada35e799116c5bda71719ee7
      0dc437d6
    • Y
      Update HISTORY.md for PR6069 (#7663) · 2400cd69
      Yanqin Jin 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7663
      
      Reviewed By: ajkr
      
      Differential Revision: D24913081
      
      Pulled By: riversand963
      
      fbshipit-source-id: 704f427812f2b4f92e16d6cbc93be64d730d1cf9
      2400cd69
  2. 12 11月, 2020 3 次提交
    • A
      Always apply bottommost_compression_opts when enabled (#7633) · ec346da9
      Andrew Kryczka 提交于
      Summary:
      Previously, even when `bottommost_compression_opts`'s `enabled` flag was set, it only took effect when
      `bottommost_compression` was also set to something other than `kDisableCompressionOption`.
      This wasn't documented and, if we kept the old behavior, it'd make
      things complicated like the migration instructions in https://github.com/facebook/rocksdb/issues/7619. We can
      simplify the API by making `bottommost_compression_opts` always take
      effect when its `enabled` flag is set.
      
      Fixes https://github.com/facebook/rocksdb/issues/7631.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7633
      
      Reviewed By: ltamasi
      
      Differential Revision: D24710358
      
      Pulled By: ajkr
      
      fbshipit-source-id: bbbdf9c1b53c63a4239d902cc3f5a11da1874647
      ec346da9
    • M
      Create a Customizable class to load classes and configurations (#6590) · c442f680
      mrambacher 提交于
      Summary:
      The Customizable class is an extension of the Configurable class and allows instances to be created by a name/ID.  Classes that extend customizable can define their Type (e.g. "TableFactory", "Cache") and  a method to instantiate them (TableFactory::CreateFromString).  Customizable objects can be registered with the ObjectRegistry and created dynamically.
      
      Future PRs will make more types of objects extend Customizable.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6590
      
      Reviewed By: cheng-chang
      
      Differential Revision: D24841553
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: d0c2132bd932e971cbfe2c908ca2e5db30c5e155
      c442f680
    • Y
      Refactor with VersionEditHandler (#6581) · 8b6b6aeb
      Yanqin Jin 提交于
      Summary:
      Added a few classes in the same class hierarchy to remove code duplication and
      refactor the logic of reading and processing MANIFEST files.
      
      New classes are as follows.
      ```
      class VersionEditHandlerBase;
      class ListColumnFamiliesHandler : VersionEditHandlerBase;
      class FileChecksumRetriever : VersionEditHandlerBase;
      class DumpManifestHandler : VersionEditHandler;
      ```
      Classes that already existed before this PR are as follows.
      ```
      class VersionEditHandler : VersionEditHandlerBase;
      ```
      
      With these classes, refactored functions: `VersionSet::Recover()`,
      `VersionSet::ListColumnFamilies()`, `VersionSet::DumpManifest()`,
      `GetFileChecksumFromManifest()`.
      
      Test Plan (devserver):
      ```
      make check
      COMPILE_WITH_ASAN=1 make check
      ```
      These refactored code, especially recovery-related logic, will be tested intensively by
      all existing unit tests and stress tests. For example, run
      ```
      make crash_test
      ```
      Verified 3 successful runs on devserver.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6581
      
      Reviewed By: ajkr
      
      Differential Revision: D20616217
      
      Pulled By: riversand963
      
      fbshipit-source-id: 048c7743aa4be2623ccd0cc3e61c0027e604e78b
      8b6b6aeb
  3. 11 11月, 2020 5 次提交
    • P
      Use NPHash64 in more places (#7632) · c57f9144
      Peter Dillinger 提交于
      Summary:
      Since the hashes should not be persisted in output_validator
      nor mock_env.
      
      Also updated NPHash64 to use 64-bit seed, and comments.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7632
      
      Test Plan:
      make check, and new build setting that enables modification
      to NPHash64, to check for behavior depending on specific values. Added
      that setting to one of the CircleCI configurations.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24833780
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 02a57652ccf1ac105fbca79e77875bb7bf7c071f
      c57f9144
    • Y
      Report if unpinnable value encountered during backward iteration (#7618) · bcba3723
      Yanqin Jin 提交于
      Summary:
      There is an undocumented behavior about a certain combination of options and operations.
      - inplace_update_support = true, and
      - call `SeekForPrev()`, `SeekToLast()`, and/or `Prev()` on unflushed data.
      
      We should stop the backward iteration and report an error of `Status::NotSupported`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7618
      
      Test Plan: make check
      
      Reviewed By: pdillinger
      
      Differential Revision: D24769619
      
      Pulled By: riversand963
      
      fbshipit-source-id: 81d199fa55ed4739ab10e719cc345a992238ccbb
      bcba3723
    • J
      Fix a seek issue with prefix extractor and timestamp (#7644) · 18aee7db
      Jay Zhuang 提交于
      Summary:
      During seek, prefix compare should not include timestamp.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7644
      
      Test Plan: added unittest
      
      Reviewed By: riversand963
      
      Differential Revision: D24772066
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 3982655a8bf8da256a738e8497b73b3d9bdac92e
      18aee7db
    • H
      fix read_amp_bytes_per_bit field size (#7651) · 16d103d3
      Huisheng Liu 提交于
      Summary:
      The field in BlockBasedTableOptions is 4 bytes:
        // Default: 0 (disabled)
        uint32_t read_amp_bytes_per_bit = 0;
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7651
      
      Reviewed By: ltamasi
      
      Differential Revision: D24844994
      
      Pulled By: riversand963
      
      fbshipit-source-id: e2695e55532256ef8996dd6939cad06987a80293
      16d103d3
    • A
      Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643) · 20260514
      Akanksha Mahajan 提交于
      Summary:
      crash tests donot run in DEBUG_MODE=0 on tmpfs when
      use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because
      direct I/O is not supported on tmpfs and tests exit.
      
      Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash
      tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are
      unset so we can sanitize direct I/O reads options in case of tmpfs as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643
      
      Test Plan:
      1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm";
                 export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0";
                 make crash_test -j64
                 2. In DEBUG_LEVEL=1 mode:  make crash_test -j64
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24766550
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e
      20260514
  4. 10 11月, 2020 2 次提交
    • Y
      Fix a bug in compaction iterator with timestamp (#7645) · 9f1c84ca
      Yanqin Jin 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/7556 introduced support for compaction iterator to perform timestamp-aware garbage collection.
      However, there was a bug. The comparison between `ikey_.user_key` and `current_user_key_` should happen
      before `key_ = current_key_.SetInternalKey(key_, &ikey_);` (line 336 of compaction_iterator.cc).
      Otherwise, after this line, `current_key_` is always the same as `ikey_.user_key`.
      
      This PR also re-arranged the order of some data members because some of them are state variables of `CompactionIterator` while others are inputs from callers.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7645
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D24845028
      
      Pulled By: riversand963
      
      fbshipit-source-id: c7e79914832701462b86867e8463cd463b6c0c25
      9f1c84ca
    • C
      Track WAL in MANIFEST: Track deleted WALs in MANIFEST after recovering from the WALs (#7649) · c3911f1a
      Cheng Chang 提交于
      Summary:
      After replaying the WALs, the memtables are flushed synchronously to L0 instead of being flushed in background. Currently, we only track WAL obsoletion events in the code path of background flush jobs. This PR tracks these events in RecoverLogFiles.
      
      After this change, we can enable `track_and_verify_wal_in_manifest` in `db_stress`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7649
      
      Test Plan: `python tools/db_crashtest.py whitebox`
      
      Reviewed By: riversand963
      
      Differential Revision: D24824501
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 207129f7b845c50b333680ce6818a68a2fad54b9
      c3911f1a
  5. 08 11月, 2020 2 次提交
    • C
      Fix a recovery corner case (#7621) · 5e794b08
      Cheng Chang 提交于
      Summary:
      Consider the following sequence of events:
      
      1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
      2. Syncing MANIFEST failed and db crashed.
      3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
      4. Db crashed again.
      5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.
      
      It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.
      
      After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621
      
      Test Plan:
      1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
      2. a new unit test is added in db_basic_test to simulate step 3.
      
      Reviewed By: riversand963
      
      Differential Revision: D24668144
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
      5e794b08
    • P
      Ribbon: major re-work of hashing, seeds, and more (#7635) · 8b8a2e9f
      Peter Dillinger 提交于
      Summary:
      * Fully optimized StandardHasher, in terms of efficiently generating Start, CoeffRow, and ResultRow from a stock hash value, with sufficient independence between them to have no measurably degraded behavior. (Degraded behavior would be an FP rate higher than explainable by 2^-b and, if using a 32-bit stock hash function, expected stock hash collisions.) Details in code comments.
      * Our standard 64-bit and 32-bit hash functions do not exhibit sufficient independence on sequential seeds (for one Ribbon construction attempt to have independent probability from the next). I have worked around this in the Ribbon code by "pre-mixing" "ordinal seeds," sequentially tried and appropriate for storage in persisted metadata, into "raw seeds," ready for application and appropriate for in-memory storage. This way the pre-mixing step (though fast) is only applied on loading or configuring the structure, not on each query or banding add.
      * Fix a subtle flaw in which backtracking not clearing ResultRow data could lead to elevated FP rate on keys that were backtracked on and should (for generality) exhibit the same FP rate as novel keys.
      * Added a basic test for PhsfQuery and construction algorithms (map or "retrieval structure" rather than set or filter), and made a few trivial related fixes.
      * Better random configuration generation in unit tests
      * Some other minor cleanup / clarification / etc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7635
      
      Test Plan: unit tests included
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24738978
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f9d03599d9e2ca3e30e9d3e7d81cd936b56f76f0
      8b8a2e9f
  6. 07 11月, 2020 6 次提交
  7. 05 11月, 2020 3 次提交
    • C
      Simplify a test case in Java ReadOnlyTest (#7608) · 1f627210
      cheng-chang 提交于
      Summary:
      The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608
      
      Test Plan: watch the existing java tests to pass
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24611621
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d
      1f627210
    • X
      Update clang-format-diff.py (#7609) · c9c9709a
      Xie Yanbo 提交于
      Summary:
      `llvm-mirror/clang` is archived. Get the `clang-format-diff.py` file from the active source.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7609
      
      Reviewed By: ajkr
      
      Differential Revision: D24711608
      
      Pulled By: pdillinger
      
      fbshipit-source-id: b115d8765ff23fbb8190290a170de21565daba84
      c9c9709a
    • Y
      Compute NeedCompact() after table builder Finish() (#7627) · b6d8e367
      Yanqin Jin 提交于
      Summary:
      In `BuildTable()`, we call `builder->Finish()` before evaluating `builder->NeedCompact()`.
      However, we call `builder->NeedCompact()` before `builder->Finish()` in compaction job. This can be wrong because the table properties collectors may rely on the success of `Finish()` to provide correct result for `NeedCompact()`.
      
      Test plan (on devserver):
      make check
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7627
      
      Reviewed By: ajkr
      
      Differential Revision: D24728741
      
      Pulled By: riversand963
      
      fbshipit-source-id: 5a0dce244e14eb1106c4f87021e6bebca82b486e
      b6d8e367
  8. 04 11月, 2020 5 次提交
    • Y
      Add API to verify whole sst file checksum (#7578) · fde0cd7c
      Yanqin Jin 提交于
      Summary:
      Existing API `VerifyChecksum()` allows application to verify sst files' block checksums.
      Since whole file, user-specified checksum is tracked in MANIFEST, we can expose a new
      API to verify sst files' file checksums.
      
      ```
      // Compute table file checksums if applicable and compare with MANIFEST.
      // Returns OK if no file has mismatching whole-file checksum.
      Status DB::VerifyFileChecksums(const ReadOptions& /*read_options*/);
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7578
      
      Test Plan: make check
      
      Reviewed By: pdillinger
      
      Differential Revision: D24436783
      
      Pulled By: riversand963
      
      fbshipit-source-id: 52b51519b842f2b3c4e3351998a97c86cbec85b3
      fde0cd7c
    • A
      Add "max_write_buffer_size_to_maintain" to crash test (#7634) · 06a92fcf
      Akanksha Mahajan 提交于
      Summary:
      Add "max_write_buffer_size_to_maintain" to crash test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634
      
      Test Plan: make crash_test -j64
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24710401
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7
      06a92fcf
    • P
      Ribbon: InterleavedSolutionStorage (#7598) · 746909ce
      Peter Dillinger 提交于
      Summary:
      The core algorithms for InterleavedSolutionStorage and the
      implementation SerializableInterleavedSolution make Ribbon fast for
      filter queries. Example output from new unit test:
      
          Simple      outside query, hot, incl hashing, ns/key: 117.796
          Interleaved outside query, hot, incl hashing, ns/key: 42.2655
          Bloom       outside query, hot, incl hashing, ns/key: 24.0071
      
      Also includes misc cleanup of previous Ribbon code and comments.
      
      Some TODOs and FIXMEs remain for futher work / investigation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7598
      
      Test Plan: unit tests included (integration work and tests coming later)
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24559209
      
      Pulled By: pdillinger
      
      fbshipit-source-id: fea483cd354ba782aea3e806f2bc96e183d59441
      746909ce
    • Y
      Avoid skipping a test in db_wal_test (#7628) · 0b94468b
      Yanqin Jin 提交于
      Summary:
      Recent test report shows that some tests have been skipped.
      
      For DBWALTest that inherits from DBTestBase, the following will always be
      true, since `env_` is an instance of `SpecialEnv`, not `Env::Default()`. Thus the test
      will always be skipped.
      
      ```
      if (options.env != Env::Default()) {
        ROCKSDB_GTEST_SKIP("Test requires default environment");
        return;
      }
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7628
      
      Test Plan:
      ./db_wal_test --gtest_filter=DBWALTest.TruncateLastLogAfterRecoverWithoutFlush
      MEM_ENV=1 ./db_wal_test --gtest_filter=DBWALTest.TruncateLastLogAfterRecoverWithoutFlush
      make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24693006
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7f2a772492a0f11bff17bbf5e9f493e9e9a1c125
      0b94468b
    • J
      Fix MultiGet unable to query timestamp data issue (#7589) · 881e0dcc
      Jay Zhuang 提交于
      Summary:
      The filter query key should not contain timestamp. The timestamp is
      stripped for Get(), but not MultiGet().
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7589
      
      Reviewed By: riversand963
      
      Differential Revision: D24494661
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: fc5ff40f9d683a89a760c6ff0ab3aed05a70c317
      881e0dcc
  9. 03 11月, 2020 2 次提交
    • Y
      Avoid skipping a test in db_test2 (#7629) · c992eb11
      Yanqin Jin 提交于
      Summary:
      Test report shows that this test has been skipped recently due to
      a condition that will never meet. `env_` is not equal to
      `Env::Default()` for DBTest2 that inherits from DBTestBase.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7629
      
      Test Plan:
      make check
      ./db_test2 --gtest_filter=DBTest2.PinnableSliceAndMmapReads
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24693317
      
      Pulled By: riversand963
      
      fbshipit-source-id: b1bbd5c1e05a6fa57c1de0d74462b69e3c2d5215
      c992eb11
    • A
      Expand effect of dictionary settings in `ColumnFamilyOptions::compression_opts` (#7619) · 1adbceb5
      Andrew Kryczka 提交于
      Summary:
      In dictionary compression's initial implementation, in order to save CPU overhead, we only enabled it
      for bottom level under the assumption that the vast majority of data is
      stored there. At that time, there was no
      such thing as `ColumnFamilyOptions::bottommost_compression_opts`, so we just
      hardcoded disabling dictionary compression in flush and compactions to
      non-bottommost level. Now, we have users who generate all their files
      through flush and are considering using dictionary compression.
      
      To support such a use case, this PR expands the scope of `ColumnFamilyOptions::compression_opts` to
      additionally include flushed files and files generated by compaction to
      a non-bottommost level. Users can still get the old behavior by moving
      their dictionary settings to `ColumnFamilyOptions::bottommost_compression_opts`
      and explicitly enabling both that and `ColumnFamilyOptions::bottommost_compression`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7619
      
      Reviewed By: ltamasi
      
      Differential Revision: D24665610
      
      Pulled By: ajkr
      
      fbshipit-source-id: 656b90bce1033fe21c71e09af931ef5bde3e464c
      1adbceb5
  10. 31 10月, 2020 1 次提交
  11. 30 10月, 2020 3 次提交
  12. 29 10月, 2020 5 次提交