1. 11 11月, 2020 4 次提交
    • Y
      Report if unpinnable value encountered during backward iteration (#7618) · bcba3723
      Yanqin Jin 提交于
      Summary:
      There is an undocumented behavior about a certain combination of options and operations.
      - inplace_update_support = true, and
      - call `SeekForPrev()`, `SeekToLast()`, and/or `Prev()` on unflushed data.
      
      We should stop the backward iteration and report an error of `Status::NotSupported`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7618
      
      Test Plan: make check
      
      Reviewed By: pdillinger
      
      Differential Revision: D24769619
      
      Pulled By: riversand963
      
      fbshipit-source-id: 81d199fa55ed4739ab10e719cc345a992238ccbb
      bcba3723
    • J
      Fix a seek issue with prefix extractor and timestamp (#7644) · 18aee7db
      Jay Zhuang 提交于
      Summary:
      During seek, prefix compare should not include timestamp.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7644
      
      Test Plan: added unittest
      
      Reviewed By: riversand963
      
      Differential Revision: D24772066
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 3982655a8bf8da256a738e8497b73b3d9bdac92e
      18aee7db
    • H
      fix read_amp_bytes_per_bit field size (#7651) · 16d103d3
      Huisheng Liu 提交于
      Summary:
      The field in BlockBasedTableOptions is 4 bytes:
        // Default: 0 (disabled)
        uint32_t read_amp_bytes_per_bit = 0;
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7651
      
      Reviewed By: ltamasi
      
      Differential Revision: D24844994
      
      Pulled By: riversand963
      
      fbshipit-source-id: e2695e55532256ef8996dd6939cad06987a80293
      16d103d3
    • A
      Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643) · 20260514
      Akanksha Mahajan 提交于
      Summary:
      crash tests donot run in DEBUG_MODE=0 on tmpfs when
      use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because
      direct I/O is not supported on tmpfs and tests exit.
      
      Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash
      tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are
      unset so we can sanitize direct I/O reads options in case of tmpfs as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643
      
      Test Plan:
      1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm";
                 export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0";
                 make crash_test -j64
                 2. In DEBUG_LEVEL=1 mode:  make crash_test -j64
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24766550
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e
      20260514
  2. 10 11月, 2020 2 次提交
    • Y
      Fix a bug in compaction iterator with timestamp (#7645) · 9f1c84ca
      Yanqin Jin 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/7556 introduced support for compaction iterator to perform timestamp-aware garbage collection.
      However, there was a bug. The comparison between `ikey_.user_key` and `current_user_key_` should happen
      before `key_ = current_key_.SetInternalKey(key_, &ikey_);` (line 336 of compaction_iterator.cc).
      Otherwise, after this line, `current_key_` is always the same as `ikey_.user_key`.
      
      This PR also re-arranged the order of some data members because some of them are state variables of `CompactionIterator` while others are inputs from callers.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7645
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D24845028
      
      Pulled By: riversand963
      
      fbshipit-source-id: c7e79914832701462b86867e8463cd463b6c0c25
      9f1c84ca
    • C
      Track WAL in MANIFEST: Track deleted WALs in MANIFEST after recovering from the WALs (#7649) · c3911f1a
      Cheng Chang 提交于
      Summary:
      After replaying the WALs, the memtables are flushed synchronously to L0 instead of being flushed in background. Currently, we only track WAL obsoletion events in the code path of background flush jobs. This PR tracks these events in RecoverLogFiles.
      
      After this change, we can enable `track_and_verify_wal_in_manifest` in `db_stress`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7649
      
      Test Plan: `python tools/db_crashtest.py whitebox`
      
      Reviewed By: riversand963
      
      Differential Revision: D24824501
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 207129f7b845c50b333680ce6818a68a2fad54b9
      c3911f1a
  3. 08 11月, 2020 2 次提交
    • C
      Fix a recovery corner case (#7621) · 5e794b08
      Cheng Chang 提交于
      Summary:
      Consider the following sequence of events:
      
      1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
      2. Syncing MANIFEST failed and db crashed.
      3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
      4. Db crashed again.
      5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.
      
      It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.
      
      After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621
      
      Test Plan:
      1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
      2. a new unit test is added in db_basic_test to simulate step 3.
      
      Reviewed By: riversand963
      
      Differential Revision: D24668144
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
      5e794b08
    • P
      Ribbon: major re-work of hashing, seeds, and more (#7635) · 8b8a2e9f
      Peter Dillinger 提交于
      Summary:
      * Fully optimized StandardHasher, in terms of efficiently generating Start, CoeffRow, and ResultRow from a stock hash value, with sufficient independence between them to have no measurably degraded behavior. (Degraded behavior would be an FP rate higher than explainable by 2^-b and, if using a 32-bit stock hash function, expected stock hash collisions.) Details in code comments.
      * Our standard 64-bit and 32-bit hash functions do not exhibit sufficient independence on sequential seeds (for one Ribbon construction attempt to have independent probability from the next). I have worked around this in the Ribbon code by "pre-mixing" "ordinal seeds," sequentially tried and appropriate for storage in persisted metadata, into "raw seeds," ready for application and appropriate for in-memory storage. This way the pre-mixing step (though fast) is only applied on loading or configuring the structure, not on each query or banding add.
      * Fix a subtle flaw in which backtracking not clearing ResultRow data could lead to elevated FP rate on keys that were backtracked on and should (for generality) exhibit the same FP rate as novel keys.
      * Added a basic test for PhsfQuery and construction algorithms (map or "retrieval structure" rather than set or filter), and made a few trivial related fixes.
      * Better random configuration generation in unit tests
      * Some other minor cleanup / clarification / etc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7635
      
      Test Plan: unit tests included
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24738978
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f9d03599d9e2ca3e30e9d3e7d81cd936b56f76f0
      8b8a2e9f
  4. 07 11月, 2020 6 次提交
  5. 05 11月, 2020 3 次提交
    • C
      Simplify a test case in Java ReadOnlyTest (#7608) · 1f627210
      cheng-chang 提交于
      Summary:
      The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608
      
      Test Plan: watch the existing java tests to pass
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24611621
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d
      1f627210
    • X
      Update clang-format-diff.py (#7609) · c9c9709a
      Xie Yanbo 提交于
      Summary:
      `llvm-mirror/clang` is archived. Get the `clang-format-diff.py` file from the active source.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7609
      
      Reviewed By: ajkr
      
      Differential Revision: D24711608
      
      Pulled By: pdillinger
      
      fbshipit-source-id: b115d8765ff23fbb8190290a170de21565daba84
      c9c9709a
    • Y
      Compute NeedCompact() after table builder Finish() (#7627) · b6d8e367
      Yanqin Jin 提交于
      Summary:
      In `BuildTable()`, we call `builder->Finish()` before evaluating `builder->NeedCompact()`.
      However, we call `builder->NeedCompact()` before `builder->Finish()` in compaction job. This can be wrong because the table properties collectors may rely on the success of `Finish()` to provide correct result for `NeedCompact()`.
      
      Test plan (on devserver):
      make check
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7627
      
      Reviewed By: ajkr
      
      Differential Revision: D24728741
      
      Pulled By: riversand963
      
      fbshipit-source-id: 5a0dce244e14eb1106c4f87021e6bebca82b486e
      b6d8e367
  6. 04 11月, 2020 5 次提交
    • Y
      Add API to verify whole sst file checksum (#7578) · fde0cd7c
      Yanqin Jin 提交于
      Summary:
      Existing API `VerifyChecksum()` allows application to verify sst files' block checksums.
      Since whole file, user-specified checksum is tracked in MANIFEST, we can expose a new
      API to verify sst files' file checksums.
      
      ```
      // Compute table file checksums if applicable and compare with MANIFEST.
      // Returns OK if no file has mismatching whole-file checksum.
      Status DB::VerifyFileChecksums(const ReadOptions& /*read_options*/);
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7578
      
      Test Plan: make check
      
      Reviewed By: pdillinger
      
      Differential Revision: D24436783
      
      Pulled By: riversand963
      
      fbshipit-source-id: 52b51519b842f2b3c4e3351998a97c86cbec85b3
      fde0cd7c
    • A
      Add "max_write_buffer_size_to_maintain" to crash test (#7634) · 06a92fcf
      Akanksha Mahajan 提交于
      Summary:
      Add "max_write_buffer_size_to_maintain" to crash test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634
      
      Test Plan: make crash_test -j64
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24710401
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7
      06a92fcf
    • P
      Ribbon: InterleavedSolutionStorage (#7598) · 746909ce
      Peter Dillinger 提交于
      Summary:
      The core algorithms for InterleavedSolutionStorage and the
      implementation SerializableInterleavedSolution make Ribbon fast for
      filter queries. Example output from new unit test:
      
          Simple      outside query, hot, incl hashing, ns/key: 117.796
          Interleaved outside query, hot, incl hashing, ns/key: 42.2655
          Bloom       outside query, hot, incl hashing, ns/key: 24.0071
      
      Also includes misc cleanup of previous Ribbon code and comments.
      
      Some TODOs and FIXMEs remain for futher work / investigation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7598
      
      Test Plan: unit tests included (integration work and tests coming later)
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24559209
      
      Pulled By: pdillinger
      
      fbshipit-source-id: fea483cd354ba782aea3e806f2bc96e183d59441
      746909ce
    • Y
      Avoid skipping a test in db_wal_test (#7628) · 0b94468b
      Yanqin Jin 提交于
      Summary:
      Recent test report shows that some tests have been skipped.
      
      For DBWALTest that inherits from DBTestBase, the following will always be
      true, since `env_` is an instance of `SpecialEnv`, not `Env::Default()`. Thus the test
      will always be skipped.
      
      ```
      if (options.env != Env::Default()) {
        ROCKSDB_GTEST_SKIP("Test requires default environment");
        return;
      }
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7628
      
      Test Plan:
      ./db_wal_test --gtest_filter=DBWALTest.TruncateLastLogAfterRecoverWithoutFlush
      MEM_ENV=1 ./db_wal_test --gtest_filter=DBWALTest.TruncateLastLogAfterRecoverWithoutFlush
      make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24693006
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7f2a772492a0f11bff17bbf5e9f493e9e9a1c125
      0b94468b
    • J
      Fix MultiGet unable to query timestamp data issue (#7589) · 881e0dcc
      Jay Zhuang 提交于
      Summary:
      The filter query key should not contain timestamp. The timestamp is
      stripped for Get(), but not MultiGet().
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7589
      
      Reviewed By: riversand963
      
      Differential Revision: D24494661
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: fc5ff40f9d683a89a760c6ff0ab3aed05a70c317
      881e0dcc
  7. 03 11月, 2020 2 次提交
    • Y
      Avoid skipping a test in db_test2 (#7629) · c992eb11
      Yanqin Jin 提交于
      Summary:
      Test report shows that this test has been skipped recently due to
      a condition that will never meet. `env_` is not equal to
      `Env::Default()` for DBTest2 that inherits from DBTestBase.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7629
      
      Test Plan:
      make check
      ./db_test2 --gtest_filter=DBTest2.PinnableSliceAndMmapReads
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24693317
      
      Pulled By: riversand963
      
      fbshipit-source-id: b1bbd5c1e05a6fa57c1de0d74462b69e3c2d5215
      c992eb11
    • A
      Expand effect of dictionary settings in `ColumnFamilyOptions::compression_opts` (#7619) · 1adbceb5
      Andrew Kryczka 提交于
      Summary:
      In dictionary compression's initial implementation, in order to save CPU overhead, we only enabled it
      for bottom level under the assumption that the vast majority of data is
      stored there. At that time, there was no
      such thing as `ColumnFamilyOptions::bottommost_compression_opts`, so we just
      hardcoded disabling dictionary compression in flush and compactions to
      non-bottommost level. Now, we have users who generate all their files
      through flush and are considering using dictionary compression.
      
      To support such a use case, this PR expands the scope of `ColumnFamilyOptions::compression_opts` to
      additionally include flushed files and files generated by compaction to
      a non-bottommost level. Users can still get the old behavior by moving
      their dictionary settings to `ColumnFamilyOptions::bottommost_compression_opts`
      and explicitly enabling both that and `ColumnFamilyOptions::bottommost_compression`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7619
      
      Reviewed By: ltamasi
      
      Differential Revision: D24665610
      
      Pulled By: ajkr
      
      fbshipit-source-id: 656b90bce1033fe21c71e09af931ef5bde3e464c
      1adbceb5
  8. 31 10月, 2020 1 次提交
  9. 30 10月, 2020 3 次提交
  10. 29 10月, 2020 8 次提交
    • Y
      Remove unused includes (#7604) · 394210f2
      Yanqin Jin 提交于
      Summary:
      This is a PR generated **semi-automatically** by an internal tool to remove unused includes and `using` statements.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D24579392
      
      Pulled By: riversand963
      
      fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd
      394210f2
    • J
      java: correct method name RocksDB.GetColumnFamilyMetaData() (#7606) · 99a0305b
      Jermy Li 提交于
      Summary:
      update GetColumnFamilyMetaData() to getColumnFamilyMetaData()
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7606
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24610298
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: d24f9b65478da1456f50747637dc95688af874de
      99a0305b
    • Z
      Updated GenerateOneFileChecksum to use requested_checksum_func_name (#7586) · ea347d80
      Zhichao Cao 提交于
      Summary:
      CreateFileChecksumGenerator may uses requested_checksum_func_name in generator context to decide which generator will be used. GenerateOneFileChecksum has not being updated to use it, which will always get the generator when the name is empty. Fix it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7586
      
      Test Plan: make check
      
      Reviewed By: riversand963
      
      Differential Revision: D24491989
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: d9fdfdd431240f0a9a2e781ddbd48a7d6c609aad
      ea347d80
    • J
      slightly improve jemalloc allocator API header (#7592) · 2404f8b9
      jsteemann 提交于
      Summary:
      Fix a few typos and avoid a potential nullptr dereference.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7592
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24582111
      
      Pulled By: riversand963
      
      fbshipit-source-id: 51e9260e8cad1fcdedd310c889f0faeec6efd937
      2404f8b9
    • V
      Fix typo in arena.cc (#7593) · 248d10fb
      vdimir 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7593
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24576218
      
      Pulled By: riversand963
      
      fbshipit-source-id: a3d77191362ca696ae9df643f97f4ab5b7ecff12
      248d10fb
    • shadowlux's avatar
      Remove duplicate close (#7594) · 793e9b7f
      shadowlux 提交于
      Summary:
      Because `Close()` have called in `Destroy()`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7594
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24576407
      
      Pulled By: riversand963
      
      fbshipit-source-id: eba70d73375fd47dd78ca64c6a1fab3628448276
      793e9b7f
    • R
      In ParseInternalKey(), include corrupt key info in Status (#7515) · 9a690a74
      Ramkumar Vadivelu 提交于
      Summary:
      Fixes Issue https://github.com/facebook/rocksdb/issues/7497
      
      When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`
      
      Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.
      
      Tests:
      - make check
      - some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
      - some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test
      
      Example of new status returns:
      - Key too small - `Corrupted Key: Internal Key too small. Size=5`
      - Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
      - Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515
      
      Reviewed By: ajkr
      
      Differential Revision: D24240264
      
      Pulled By: ramvadiv
      
      fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
      9a690a74
    • A
      Require only one `Logger::Logv()` implementation (#7605) · 6c2c0635
      Andrew Kryczka 提交于
      Summary:
      A user who extended `Logger` recently pointed out it is unusual to
      require they implement the two-argument `Logv()` overload when they've
      already implemented the three-argument `Logv()` overload. I agree with
      that and think we can fix it by only calling the two-argument overload
      from the default implementation of the three-argument overload. Then
      when the three-argument overload is overridden, RocksDB would not
      rely on the two-argument overload. Only `Logger::LogHeader()` needed
      adjustment to achieve this.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7605
      
      Reviewed By: riversand963
      
      Differential Revision: D24584749
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9aabe040ac761c4c0dbebc4be046967403ecaf21
      6c2c0635
  11. 28 10月, 2020 2 次提交
    • P
      Give instructions instead of broken 2to3 for clang-format-diff.py (#7603) · 0e2e6756
      Peter Dillinger 提交于
      Summary:
      My previous change to use lib2to3 to migrate clang-format-diff.py
      for Python 2 only works if there's nothing to reformat. Instead, give
      instructions to download to REPO_ROOT.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7603
      
      Test Plan: Try the instructions on a fresh CentOS 8 devserver
      
      Reviewed By: riversand963
      
      Differential Revision: D24569608
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1410ba163e016b226e883dec93fae3df9ed0eab2
      0e2e6756
    • M
      Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) · f35f7f27
      mrambacher 提交于
      Summary:
      This PR does a few things:
      
      1.  The MockFileSystem class was split out from the MockEnv.  This change would theoretically allow a MockFileSystem to be used by other Environments as well (if we created a means of constructing one).  The MockFileSystem implements a FileSystem in its entirety and does not rely on any Wrapper implementation.
      
      2.  Make the RocksDB test suite work when MOCK_ENV=1 and ENCRYPTED_ENV=1 are set.  To accomplish this, a few things were needed:
      - The tests that tried to use the "wrong" environment (Env::Default() instead of env_) were updated
      - The MockFileSystem was changed to support the features it was missing or mishandled (such as recursively deleting files in a directory or supporting renaming of a directory).
      
      3.  Updated the test framework to have a ROCKSDB_GTEST_SKIP macro.  This can be used to flag tests that are skipped.  Currently, this defaults to doing nothing (marks the test as SUCCESS) but will mark the tests as SKIPPED when RocksDB is upgraded to a version of gtest that supports this (gtest-1.10).
      
      I have run a full "make check" with MEM_ENV, ENCRYPTED_ENV,  both, and neither under both MacOS and RedHat.  A few tests were disabled/skipped for the MEM/ENCRYPTED cases.  The error_handler_fs_test fails/hangs for MEM_ENV (presumably a timing problem) and I will introduce another PR/issue to track that problem.  (I will also push a change to disable those tests soon).  There is one more test in DBTest2 that also fails which I need to investigate or skip before this PR is merged.
      
      Theoretically, this PR should also allow the test suite to run against an Env loaded from the registry, though I do not have one to try it with currently.
      
      Finally, once this is accepted, it would be nice if there was a CircleCI job to run these tests on a checkin so this effort does not become stale.  I do not know how to do that, so if someone could write that job, it would be appreciated :)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7566
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24408980
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 911b1554a4d0da06fd51feca0c090a4abdcb4a5f
      f35f7f27
  12. 27 10月, 2020 2 次提交
    • Y
      Perform post-flush updates of memtable list in a callback (#6069) · 6134ce64
      Yanqin Jin 提交于
      Summary:
      Currently, the following interleaving of events can lead to SuperVersion containing both immutable memtables as well as the resulting L0. This can cause Get to return incorrect result if there are merge operands. This may also affect other operations such as single deletes.
      
      ```
        time  main_thr  bg_flush_thr  bg_compact_thr  compact_thr  set_opts_thr
      0  |                                                         WriteManifest:0
      1  |                                           issue compact
      2  |                                 wait
      3  |   Merge(counter)
      4  |   issue flush
      5  |                   wait
      6  |                                                         WriteManifest:1
      7  |                                 wake up
      8  |                                 write manifest
      9  |                  wake up
      10 |  Get(counter)
      11 |                  remove imm
         V
      ```
      
      The reason behind is that: one bg flush thread's installing new `Version` can be batched and performed by another thread that is the "leader" MANIFEST writer. This bg thread removes the memtables from current super version only after `LogAndApply` returns. After the leader MANIFEST writer signals (releasing mutex) this bg flush thread, it is possible that another thread sees this cf with both memtables (whose data have been flushed to the newest L0) and the L0 before this bg flush thread removes the memtables.
      
      To address this issue, each bg flush thread can pass a callback function to `LogAndApply`. The callback is responsible for removing the memtables. Therefore, the leader MANIFEST writer can call this callback and remove the memtables before releasing the mutex.
      
      Test plan (devserver)
      ```
      $make merge_test
      $./merge_test --gtest_filter=MergeTest.MergeWithCompactionAndFlush
      $make check
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6069
      
      Reviewed By: cheng-chang
      
      Differential Revision: D18790894
      
      Pulled By: riversand963
      
      fbshipit-source-id: e41bd600c0448b4f4b2deb3f7677f95e3076b4ed
      6134ce64
    • L
      Integrate BlobFileBuilder into the compaction process (#7573) · a7a04b68
      Levi Tamasi 提交于
      Summary:
      Similarly to how https://github.com/facebook/rocksdb/issues/7345
      integrated blob file writing into the flush process,
      the patch adds support for writing blob files to the compaction logic.
      Namely, if `enable_blob_files` is set, large values encountered during
      compaction are extracted to blob files and replaced with blob indexes.
      The resulting blob files are then logged to the MANIFEST as part of the
      compaction job's `VersionEdit` and added to the `Version` alongside any
      table files written by the compaction. Any errors during blob file building fail
      the compaction job.
      
      There will be a separate follow-up patch to perform blob garbage collection
      during compactions.
      
      In addition, the patch continues to chip away at the mess around computing
      various compaction related statistics by eliminating some code duplication
      and by making the `num_output_files` and `bytes_written` stats more consistent
      for flushes, compactions, and recovery.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7573
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D24404696
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 21216af3a172ad3ce8f85d11cd30923784ae426c
      a7a04b68