1. 26 9月, 2020 7 次提交
    • Y
      Re-add extra compiler flags when building unittests (#7437) · 8f826403
      Yanqin Jin 提交于
      Summary:
      Re-add extra_compiler_flags when building unit tests for fbcode.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7437
      
      Test Plan: Integrate with buck and run internal tests.
      
      Reviewed By: pdillinger
      
      Differential Revision: D23943924
      
      Pulled By: riversand963
      
      fbshipit-source-id: b92b7ad003e06e0860c45efc5f7f9684233d0c55
      8f826403
    • J
      Fix TSAN build and re-enable the tests (#7386) · fa92b9dc
      Jay Zhuang 提交于
      Summary:
      Resolve TSAN build warnings and re-enable disabled TSAN tests.
      
      Not sure if it's a compiler issue or TSAN check issue. Switching from
      conditional operator to if-else mitigated the problem.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7386
      
      Test Plan:
      run TSAN check 10 times in circleci.
      
      ```
      WARNING: ThreadSanitizer: data race (pid=27735)
        Atomic write of size 8 at 0x7b54000005e8 by thread T32:
          #0 __tsan_atomic64_store <null> (db_test+0x4cee95)
          https://github.com/facebook/rocksdb/issues/1 std::__atomic_base<unsigned long>::store(unsigned long, std::memory_order) /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/atomic_base.h:374:2 (db_test+0x78460e)
          https://github.com/facebook/rocksdb/issues/2 rocksdb::VersionSet::SetLastSequence(unsigned long) /home/circleci/project/./db/version_set.h:1058:20 (db_test+0x78460e)
      ...
        Previous read of size 8 at 0x7b54000005e8 by thread T31:
          #0 bool rocksdb::DBImpl::MultiCFSnapshot<std::unordered_map<unsigned int, rocksdb::DBImpl::MultiGetColumnFamilyData, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, rocksdb::DBImpl::MultiGetColumnFamilyData> > > >(rocksdb::ReadOptions const&, rocksdb::ReadCallback*, std::function<rocksdb::DBImpl::MultiGetColumnFamilyData* (std::unordered_map<unsigned int, rocksdb::DBImpl::MultiGetColumnFamilyData, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, rocksdb::DBImpl::MultiGetColumnFamilyData> > >::iterator&)>&, std::unordered_map<unsigned int, rocksdb::DBImpl::MultiGetColumnFamilyData, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, rocksdb::DBImpl::MultiGetColumnFamilyData> > >*, unsigned long*) /home/circleci/project/db/db_impl/db_impl.cc (db_test+0x715087)
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D23725226
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: a6d662a5ea68111246cd32ec95f3411a25f76bc6
      fa92b9dc
    • P
      EnableFileDeletions only read field while holding mutex (#7435) · c8a12aa9
      Peter Dillinger 提交于
      Summary:
      Possible fix for a TSAN issue reported in EnableFileDeletions.
      disable_delete_obsolete_files_ should only be accessed holding the db
      mutex, but for logging it was being accessed outside holding the mutex,
      now fixed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7435
      
      Test Plan: existing tests, watch for recurrence
      
      Reviewed By: ltamasi
      
      Differential Revision: D23917578
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 8573025bca3f6fe169b24b87bbfc4ce9667b0482
      c8a12aa9
    • C
      Track WAL in MANIFEST: add method to check WAL consistency (#7236) · 1a24f4d1
      Cheng Chang 提交于
      Summary:
      Add a method `CheckWals` in `WalSet` to check the logs on disk. See `CheckWals`'s comments.
      This method will be used to check consistency of WALs during DB recovery.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7236
      
      Test Plan: a set of tests are added to wal_edit_test.cc.
      
      Reviewed By: riversand963
      
      Differential Revision: D23036505
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 5b1d6857ac173429b00f950c32c4a5b8d063a732
      1a24f4d1
    • J
      MultiGet() with timestamp should respect snapshot (#7404) · 8c9fff91
      Jay Zhuang 提交于
      Summary:
      Similar to PR https://github.com/facebook/rocksdb/issues/7227, add read callback to filter out rows with with
      higher sequence number.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7404
      
      Reviewed By: riversand963
      
      Differential Revision: D23790762
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: bce854307612f1a22f985ffc934da627d0a139c2
      8c9fff91
    • J
      Add ASSERT_STATUS_CHECKED flag support (#7332) · c5b3128f
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7332
      
      Test Plan: `cmake .. -DASSERT_STATUS_CHECKED=1`, then run tests
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D23437128
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 66fce57ae5d126e7a8b91c16a73c81438cf3b65e
      c5b3128f
    • L
      Introduce a helper method UncompressData (#7434) · 30fb9dd5
      Levi Tamasi 提交于
      Summary:
      The patch introduces a helper method in `util/compression.h` called `UncompressData`
      that dispatches calls to the correct uncompression method based on type, and changes
      `UncompressBlockContentsForCompressionType` and `Benchmark::Uncompress` in
      `db_bench` so they are implemented in terms of the new method. This eliminates
      some code duplication. (`Benchmark::Compress` is also updated to use the previously
      introduced `CompressData` helper.)
      
      In addition, the patch brings the implementation of `Snappy_Uncompress` into sync with
      the other uncompression methods by making the method compute the buffer size and allocate
      the buffer itself. Finally, the patch eliminates some potentially risky back-and-forth conversions
      between various unsigned and signed integer types by exposing the size of the allocated buffer
      as a `size_t` instead of an `int`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7434
      
      Test Plan:
      `make check`
      `./db_bench -benchmarks=compress,uncompress --compression_type ...`
      
      Reviewed By: riversand963
      
      Differential Revision: D23900011
      
      Pulled By: ltamasi
      
      fbshipit-source-id: b25df63ceec4639889be94acb22eb53e530c54e0
      30fb9dd5
  2. 25 9月, 2020 1 次提交
  3. 24 9月, 2020 9 次提交
    • Z
      Add AppendWithVerify and PositionedAppendWithVerify to Env and FileSystem (#7419) · 0ce9b3a2
      Zhichao Cao 提交于
      Summary:
      Add new AppendWithVerify and PositionedAppendWithVerify APIs to Env and FileSystem to bring the data verification information (data checksum information) from upper layer (e.g., WritableFileWriter) to the storage layer. This PR only include the API definition, no functional codes are added to unblock other developers which depend on these APIs.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7419
      
      Test Plan: make -j32
      
      Reviewed By: pdillinger
      
      Differential Revision: D23883196
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 94676c26bc56144cc32e3661f84f21eccd790411
      0ce9b3a2
    • A
      Add IO Tracer Parser (#7333) · 98ac6b64
      Akanksha Mahajan 提交于
      Summary:
      Implement a parsing tool io_tracer_parser that takes IO trace file (binary file) with command line argument --io_trace_file and output file with --output_file and dumps the IO trace records in outputfile in human readable form.
      
      Also added unit test cases that generates IO trace records and calls io_tracer_parse to parse those records.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7333
      
      Test Plan:
      make check -j64,
       Add unit test cases.
      
      Reviewed By: anand1976
      
      Differential Revision: D23772360
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 9c20519c189362e6663352d08863326f3e496271
      98ac6b64
    • P
      Add and fix clang -Wshift-sign-overflow (#7431) · 31d1cea4
      Peter Dillinger 提交于
      Summary:
      This option is apparently used by some teams within Facebook
      (internal ref T75998621)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7431
      
      Test Plan: USE_CLANG=1 make check before (fails) and after
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23876584
      
      Pulled By: pdillinger
      
      fbshipit-source-id: abb8b67a1f1aac75327944d266e284b2b6727191
      31d1cea4
    • A
      Fix scope of `ReadOptions` in `SstFileReader` (#7432) · 0698598f
      Andrew Kryczka 提交于
      Summary:
      a4a4a2da changed the contract of `TableReader::NewIterator()` to require
      `ReadOptions` outlive the returned iterator. But I didn't notice that
      `SstFileReader` violates the new contract and needs to be adapted. The unit test
      provided here exposes the problem when run under ASAN.
      
      ```
      $ ./sst_file_reader_test --gtest_filter=SstFileReaderTest.ReadOptionsOutOfScope
      Note: Google Test filter = SstFileReaderTest.ReadOptionsOutOfScope
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from SstFileReaderTest
      [ RUN      ] SstFileReaderTest.ReadOptionsOutOfScope
      =================================================================
      ==3238048==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd6189e158 at pc 0x000001298350 bp 0x7ffd6189c280 sp 0x7ffd6189c278
      READ of size 8 at 0x7ffd6189e158 thread T0
          #0 0x129834f in rocksdb::BlockBasedTableIterator::InitDataBlock() table/block_based/block_based_table_iterator.cc:236
          https://github.com/facebook/rocksdb/issues/1 0x12b01f7 in rocksdb::BlockBasedTableIterator::SeekImpl(rocksdb::Slice const*) table/block_based/block_based_table_iterator.cc:77
          https://github.com/facebook/rocksdb/issues/2 0x844d28 in rocksdb::IteratorWrapperBase<rocksdb::Slice>::SeekToFirst() table/iterator_wrapper.h:116
          https://github.com/facebook/rocksdb/issues/3 0x844d28 in rocksdb::DBIter::SeekToFirst() db/db_iter.cc:1352
          https://github.com/facebook/rocksdb/issues/4 0x52482b in rocksdb::SstFileReaderTest_ReadOptionsOutOfScope_Test::TestBody() table/sst_file_reader_test.cc:150
          https://github.com/facebook/rocksdb/issues/5 0x5f433c in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
          https://github.com/facebook/rocksdb/issues/6 0x5f433c in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
          https://github.com/facebook/rocksdb/issues/7 0x5cc2de in testing::Test::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3973
          https://github.com/facebook/rocksdb/issues/8 0x5cc988 in testing::Test::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3965
          https://github.com/facebook/rocksdb/issues/9 0x5cc988 in testing::TestInfo::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4149
          https://github.com/facebook/rocksdb/issues/10 0x5cce9a in testing::TestInfo::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4124
          https://github.com/facebook/rocksdb/issues/11 0x5cce9a in testing::TestCase::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4267
          https://github.com/facebook/rocksdb/issues/12 0x5ce696 in testing::TestCase::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4253
          https://github.com/facebook/rocksdb/issues/13 0x5ce696 in testing::internal::UnitTestImpl::RunAllTests() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6633
          https://github.com/facebook/rocksdb/issues/14 0x5f541c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
          https://github.com/facebook/rocksdb/issues/15 0x5f541c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
          https://github.com/facebook/rocksdb/issues/16 0x5cee74 in testing::UnitTest::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6242
          https://github.com/facebook/rocksdb/issues/17 0x4c0332 in RUN_ALL_TESTS() third-party/gtest-1.8.1/fused-src/gtest/gtest.h:22104
          https://github.com/facebook/rocksdb/issues/18 0x4c0332 in main table/sst_file_reader_test.cc:213
          https://github.com/facebook/rocksdb/issues/19 0x7fb0263281a5 in __libc_start_main (/usr/local/fbcode/platform007/lib/libc.so.6+0x211a5)
          https://github.com/facebook/rocksdb/issues/20 0x523e56  (/data/users/andrewkr/rocksdb/sst_file_reader_test+0x523e56)
      
      Address 0x7ffd6189e158 is located in stack of thread T0 at offset 568 in frame
          #0 0x52428f in rocksdb::SstFileReaderTest_ReadOptionsOutOfScope_Test::TestBody() table/sst_file_reader_test.cc:131
      
        This frame has 9 object(s):
          [32, 40) 'reader'
          [96, 104) '<unknown>'
          [160, 168) '<unknown>'
          [224, 232) 'iter'
          [288, 304) 'gtest_ar'
          [352, 368) '<unknown>'
          [416, 440) 'keys'
          [480, 512) '<unknown>'
          [544, 680) 'ropts' <== Memory access at offset 568 is inside this variable
      HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
            (longjmp and C++ exceptions *are* supported)
       AddressSanitizer: stack-use-after-scope table/block_based/block_based_table_iterator.cc:236 in rocksdb::BlockBasedTableIterator::InitDataBlock()
      ...
      ```
      
      The fix is to use `ArenaWrappedDBIter` which has support for holding a
      `ReadOptions` in an `Arena` whose lifetime is tied to the iterator.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7432
      
      Test Plan: verified the provided unit test no longer fails
      
      Reviewed By: pdillinger
      
      Differential Revision: D23880043
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9464c37408f7bd7c9c4a90ceffb04d9f0ca7a494
      0698598f
    • J
      CircleCI build improvements (#7392) · 6980fff4
      Jay Zhuang 提交于
      Summary:
      * add slack integration
      * parallelize `check_some` tests with gtest-parallel
      * upgrade build image to the latest one: ubuntu-1604:202007-01
      * clean up the config
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7392
      
      Reviewed By: pdillinger
      
      Differential Revision: D23845379
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: f3af82dc2daeba6441f61c858e08d3936c6ccd10
      6980fff4
    • X
      build: make it compile with @mode/win (#7406) · 249f2b59
      Xavier Deguillard 提交于
      Summary:
      While rocksdb can compile on both macOS and Linux with Buck, it couldn't be
      compiled on Windows. The only way to compile it on Windows was with the CMake
      build.
      
      To keep the multi-platform complexity low, I've simply included all the Windows
      bits in the TARGETS file, and added large #if blocks when not on Windows, the
      same was done on the posix specific files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7406
      
      Test Plan:
      On my devserver:
        buck test //rocksdb/...
      On Windows:
        buck build mode/win //rocksdb/src:rocksdb_lib
      
      Reviewed By: pdillinger
      
      Differential Revision: D23874358
      
      Pulled By: xavierd
      
      fbshipit-source-id: 8768b5d16d7e8f44b5ca1e2483881ca4b24bffbe
      249f2b59
    • P
      Fix/minimize mock_time_env.h dependencies (#7426) · ac1734d0
      Peter Dillinger 提交于
      Summary:
      (a) own copy of kMicrosInSecond
      (b) out-of-line sync point code
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7426
      
      Test Plan: FB internal
      
      Reviewed By: ajkr
      
      Differential Revision: D23861363
      
      Pulled By: pdillinger
      
      fbshipit-source-id: de6b1621dca2f7391c5ff72bad04a7613dc27527
      ac1734d0
    • R
      db_iter.cc: DBIter::Next(): minor improve (#7407) · b005f969
      rockeet 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7407
      
      Reviewed By: ajkr
      
      Differential Revision: D23817122
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 62bf43e4d780fad8c682edd750b4800b5b8f4a77
      b005f969
    • M
      Fix valgrind issues with configurable_test (#7424) · 5d6ff693
      mrambacher 提交于
      Summary:
      Valgrind was reporting a problem with the configurable_test in some GTEST code.  This problem was caused by using a std::function as a GTEST parameter.  This change changes the test to use a string as a function parameter (backed by a map) and fixes the valgrind issue.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7424
      
      Reviewed By: ajkr
      
      Differential Revision: D23855540
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2f2be03f7f92d96644aa9fa6481e4f37f2cfa5f5
      5d6ff693
  4. 23 9月, 2020 5 次提交
    • C
      Track WAL in MANIFEST: update WalMetadata for WAL syncing (#7414) · 00ee89b5
      Cheng Chang 提交于
      Summary:
      There are some tricky behaviors related to WAL sync:
      
      - When creating a WAL, the WAL might not be synced, if the WAL directory is not synced, the WAL file's metadata may not even be synced to disk, so during recovery, when listing the WAL directory, the WAL may not even show up.
      - During each DB::Write, the WriteOption can control whether the WAL should be synced, so a WAL previously not synced on creation can be synced during Write.
      
      For each `SyncWAL`, we'll track the synced status and the current WAL size. Previously, we only track the WAL size on closing.
      During recovery, we check that the on-disk WAL size is >= the last synced size.
      
      So this PR introduces `synced_size` and `closed` to `WalMetadata` for the above design update.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7414
      
      Test Plan:
      - updated wal_edit_test
      - updated version_edit_test
      
      Reviewed By: riversand963
      
      Differential Revision: D23796127
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 5498ab80f537c48a10157e71a4745716aef5cf30
      00ee89b5
    • Y
      Allow mutex to be released in GetAggregatedIntProperty (#7412) · cd72f897
      Yanqin Jin 提交于
      Summary:
      Current implementation holds db mutex while calling
      `GetAggregatedIntProperty()`. For property kEstimateTableReadersMem,
      this can be expensive, especially if the number of table readers is
      high.
      We can release and re-acquire db mutex if
      property_info.need_out_of_mutex is true.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7412
      
      Test Plan:
      make check
      COMPILE_WITH_ASAN=1 make check
      COMPILE_WITH_TSAN=1 make check
      Also test internally on a shadow host. Used bpf to verify the
      excessively long db mutex holding no longer exists when applications
      call GetApproximateMemoryUsageByType().
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23794824
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6bc02a59fd25613d343a62cf817467c7122c9721
      cd72f897
    • Y
      Fix RocksDB SIGILL error on Raspberry PI 4 (#7233) · 29f7bbef
      Yuqi Gu 提交于
      Summary:
      Issue:https://github.com/facebook/rocksdb/issues/7042
      
      No PMULL runtime check will lead to SIGILL on a Raspberry pi 4.
      
      Leverage 'getauxval' to get Hardware-Cap to detect whether target
      platform does support PMULL or not in runtime.
      
      Consider the condition that the target platform does support crc32 but not support PMULL.
      In this condition, the code should leverage the crc32 instruction
      rather than skip all hardware crc32 instruction.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7233
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23790116
      
      fbshipit-source-id: a3ebd821fbd4a38dd2f59064adbb7c3013ee8140
      29f7bbef
    • S
      Add a release build with RTTI in CircleCI (#7364) · 3591da33
      sdong 提交于
      Summary:
      Release build RTTI is not covered in CI. Add one.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7364
      
      Test Plan: Watch the build results.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23602157
      
      fbshipit-source-id: f0bb0f918632c5ee009db9d0a47a771f3c98195b
      3591da33
    • P
      Possible fix to flaky db_write_test (#7418) · 6727259e
      Peter Dillinger 提交于
      Summary:
      Make the test robust to spurious wakeups on condition variable,
      and clear sync points to ensure no use-after-free.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7418
      
      Test Plan: repeated runs on updated test, watch CircleCI for recurrence
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23828823
      
      Pulled By: pdillinger
      
      fbshipit-source-id: af85117d9c02602541a90252840e0e5a6996de5b
      6727259e
  5. 22 9月, 2020 2 次提交
    • P
      Less I/O for incremental backups, slightly better corruption detection (#7413) · 9d8eb77c
      Peter Dillinger 提交于
      Summary:
      Two relatively simple functional changes to incremental backup
      behavior, integrated with a minor refactoring to reduce code redundancy and
      improve error/log message. There are nuances to the impact of these changes,
      but I believe they are fundamentally good and generally safe. Those functional
      changes:
      
      * Incremental backups no longer read DB table files that are already saved to a
      shared part of the backup directory, unless `share_files_with_checksum` is used
      with `kLegacyCrc32cAndFileSize` naming (discouraged) where crc32c full file
      checksums are needed to determine file naming.
        * Justification: incremental backups should not need to read the whole DB,
      especially without rate limiting. (Although other BackupEngine reads are not
      rate limited either, other non-trivial reads are generally limited by a
      corresponding write, as in copying files.) Also, the fact that this is not
      already fixed was arguably a bug/oversight in the implementation of https://github.com/facebook/rocksdb/issues/7110.
      
      * When considering whether a table file is already backed up in a shared part
      of backup directory, BackupEngine would already query the sizes of source (DB)
      and pre-existing destination (backup) files. BackupEngine now uses these file
      sizes to detect corruption, as at least one of (a) old backup, (b) backup in
      progress, or (c) current DB is corrupt if there's a size mismatch.
        * Justification: a random related fix that also helps to cover a small hole
      in corruption checking uncovered by the other functional change:
        * For `share_table_files` without "checksum" (not recommended), the other
      change regresses in detecting fundamentally unsafe use of this option
      combination: when you might generate different versions of same SST file
      number. As demonstrated by `BackupableDBTest.FailOverwritingBackups,` this
      regression is greatly mitigated by the new file size checking. Nevertheless,
      almost no reason to use `share_files_with_checksum=false` should remain, and
      comments are updated appropriately.
      
      Also, this change renames internal function `CalculateChecksum` to
      `ReadFileAndComputeChecksum` to make the performance impact of this function
      clear in code reviews.
      
      It is not clear what 'same_path' is for in backupable_db.cc, and I suspect it
      cannot be true for a DB with unique file names (like DBImpl). Nevertheless,
      I've tried to keep its functionality intact when `true` to minimize risk for
      now, despite having no unit tests for which it is true.
      
      Select impact details (much more in unit tests): For
      `share_files_with_checksum`, I am confident there is no regression (vs.
      pre-6.12) in detecting DB or backup corruption at backup creation time, mostly
      because the old design did not leverage this extra checksum computation for
      detecting inconsistencies at backup creation time. (With computed checksums in
      names, a recently corrupted file just looked like a different file vs. what was
      already backed up.)
      
      Even in the hypothetical case of DB session id collision (~100 bits entropy
      collision), file size in name and/or our file size check add an extra layer of
      protection against false success in creating an accurate new backup. (Unit test
      included.)
      
      `DB::VerifyChecksum` and `BackupEngine::VerifyBackup` with checksum checking
      are still able to catch corruptions that `CreateNewBackup` does not. Note that
      when custom file checksum support is added to BackupEngine, that will
      essentially give the same power as `DB::VerifyChecksum` into `CreateNewBackup`.
      We could add options for `CreateNewBackup` to cover some of what would be
      caught by `VerifyBackup` with checksum checking.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7413
      
      Test Plan:
      Two new unit tests included, both of which fail without these
      changes. Although we don't test the I/O improvement directly, we test it
      indirectly in DB corruption detection power that was inadvertently unlocked
      with new backup file naming PLUS computing current content checksums (now
      removed). (I don't think that case of DB corruption detection justifies reading
      the whole DB on incremental backup.)
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D23818480
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 148aff16f001af5b9fd4b22f155311c2461f1bac
      9d8eb77c
    • P
      Update HISTORY.md for #7346 (#7417) · 52691703
      Peter Dillinger 提交于
      Summary:
      Copied from Andrew's entry for 6.12.3. Inserted here
      retroactive to 6.12
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7417
      
      Test Plan: no code change
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23815980
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 3c8a052cdb61be1215d311556c9487f9ea5c8cb0
      52691703
  6. 20 9月, 2020 2 次提交
  7. 19 9月, 2020 2 次提交
    • P
      Postponing custom checksum support in BackupEngine (#7411) · b475a83f
      Peter Dillinger 提交于
      Summary:
      This change reverts BackupEngine to 6.12 state to accommodate a
      higher-priority fix that does not easily merge with this custom checksum
      support. We intend to reinstate this support soon, by merging a revert
      of this change.
      
      For backupable_db_test, I've removed the tests depending on this
      feature.
      
      I've also removed relevant HISTORY.md entry.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7411
      
      Test Plan: unit tests
      
      Reviewed By: ajkr
      
      Differential Revision: D23793835
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7e861436539584799b13d1a8ae559b81b6d08052
      b475a83f
    • T
      Add missing Java API for boolean and numerical fields in DBOptions (#7387) · 6efae4b0
      Tomasz Posluszny 提交于
      Summary:
      Exposes the following previously missing DBOptions fields in the RocksJava API:
      - persist_stats_to_disk
      - max_write_batch_group_size_bytes
      - skip_checking_sst_file_sizes_on_db_open
      - avoid_unnecessary_blocking_io
      - write_dbid_to_manifest
      - log_readahead_size
      - best_efforts_recovery
      - max_bgerror_resume_count
      - bgerror_resume_retry_interval
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7387
      
      Reviewed By: siying
      
      Differential Revision: D23707785
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: e5688c7d40d83128734605ef7b0720a55fdfa699
      6efae4b0
  8. 18 9月, 2020 5 次提交
    • Z
      Map retryable IO error during Flush without WAL to soft error and no switch... · c268628c
      Zhichao Cao 提交于
      Map retryable IO error during Flush without WAL to soft error and no switch memtable during resume (#7310)
      
      Summary:
      In the current implementation, any retryable IO error happens during Flush is mapped to a hard error. In this case, DB is stopped and write is stalled unless the background error is cleaned. In this PR, if WAL is DISABLED, the retryable IO error during FLush is mapped to a soft error. Such that, the memtable can continue receive the writes. At the same time, if auto resume is triggered, SwtichMemtable will not be called during Flush when resuming the DB to avoid to many small memtables. Testing cases are added.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7310
      
      Test Plan: adding new unit test, pass make check.
      
      Reviewed By: anand1976
      
      Differential Revision: D23710892
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: bc4ca50d11c6b23b60d2c0cb171d86d542b038e9
      c268628c
    • A
      RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046) · 3ac07a12
      Adam Retter 提交于
      Summary:
      Expose from C++ API to Java API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7046
      
      Reviewed By: riversand963
      
      Differential Revision: D23726297
      
      Pulled By: pdillinger
      
      fbshipit-source-id: fc66bf626ce6fe9797e7d021ac849eacab91bf6d
      3ac07a12
    • A
      Update HISTORY.md with IO fencing error code (#7402) · b9750c7c
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7402
      
      Reviewed By: pdillinger
      
      Differential Revision: D23761689
      
      Pulled By: anand1976
      
      fbshipit-source-id: 59e10f0aaa80f6c0f5a46dc99467138c4cee0511
      b9750c7c
    • P
      Restore file size in backup table file names (and other cleanup) (#7400) · 93719fc9
      Peter Dillinger 提交于
      Summary:
      Prior to 6.12, backup files using share_files_with_checksum had
      the file size encoded in the file name, after the last '\_' and before
      the last '.'. We considered this an implementation detail subject to
      change, and indeed removed this information from the file name (with an
      option to use old behavior) because it was considered
      ineffective/inefficient for file name uniqueness. However, some
      downstream RocksDB users were relying on this information since the file
      size is not explicitly in the backup manifest file.
      
      This primary purpose of this change is "retrofitting" the 6.12 release
      (not yet a public release) to simultaneously support the benefits of the
      new naming scheme (I/O performance and data correctness at scale) and
      preserve the file size information, both as default behaviors. With this
      change, we are essentially making the file size information encoded in
      the file name an official, though obscure, extension of the backup meta
      file format.
      
      We preserve an option (kLegacyCrc32cAndFileSize) to use the original
      "legacy" naming scheme, with its caveats, and make it easy to omit the
      file size information (no kFlagIncludeFileSize), for more compact file
      names. But note that changing the naming scheme used on an existing db
      and backup directory can lead to transient space amplification, as some
      files will be stored under two names in the shared_checksum directory.
      Because some backups were saved using the original 6.12 naming scheme,
      we offer two ways of dealing with those files: SST files generated by
      older 6.12 versions can either use the default naming scheme in effect
      when the SST files were generated (kFlagMatchInterimNaming, default, no
      transient space amplification) or can use a new naming scheme (no
      kFlagMatchInterimNaming, potential space amplification because some
      already stored files getting a new name).
      
      We don't have a natural way to detect which files were generated by
      previous 6.12 versions, but this change hacks one in by changing DB
      session ids to now use a more concise encoding, reducing file name
      length, saving ~dozen bytes from SST files, and making them visually
      distinct from DB ids so that they are less likely to be mixed up.
      
      Two final auxiliary notes:
      Recognizing that the backup file names have become a de facto part of
      the backup meta schema, this change makes them easier to parse and
      extend by putting a distinct marker, 's', before DB session ids embedded
      in the name. When we extend this to allow custom checksums in the name,
      they can get their own marker to ensure safe parsing. For backward
      compatibility, file size does not get a marker but is assumed for
      `_[0-9]+[.]`
      
      Another change from initial 6.12 default behavior is never including
      file custom checksum in the file name. Looking ahead to 6.13, we do not
      want the default behavior to cause backup space amplification for
      someone turning on file custom checksum checking in BackupEngine; we
      want that to be an easy decision. When implemented, including file
      custom checksums in backup file names will be a non-default option.
      
      Actual file name patterns and priorities, as regexes:
      
          kLegacyCrc32cAndFileSize OR pre-6.12 SST file ->
            [0-9]+_[0-9]+_[0-9]+[.]sst
          kFlagMatchInterimNaming set (default) AND early 6.12 SST file ->
            [0-9]+_[0-9a-fA-F-]+[.]sst
          kUseDbSessionId AND NOT kFlagIncludeFileSize ->
            [0-9]+_s[0-9A-Z]{20}[.]sst
          kUseDbSessionId AND kFlagIncludeFileSize (default) ->
            [0-9]+_s[0-9A-Z]{20}_[0-9]+[.]sst
      
      We might add opt-in options for more '\_' separated data in the name,
      but embedded file size, if present, will always be after last '\_' and
      before '.sst'.
      
      This change was originally applied to version 6.12. (See https://github.com/facebook/rocksdb/issues/7390)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7400
      
      Test Plan:
      unit tests included. Sync point callbacks are used to mimic
      previous version SST files.
      
      Reviewed By: ajkr
      
      Differential Revision: D23759587
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f62d8af4e0978de0a34f26288cfbe66049b70025
      93719fc9
    • P
      Fix HISTORY.md and check_format_compatible.sh for 6.13 branch (#7401) · 7780a360
      Peter Dillinger 提交于
      Summary:
      Make "unreleased" section for HISTORY.md with things misplaced
      into 6.12 and 6.13
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7401
      
      Test Plan: see how it goes, and `git diff origin/6.13.fb HISTORY.md`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23759740
      
      Pulled By: pdillinger
      
      fbshipit-source-id: fc441916c7ff2bbb8d5384137653b340d4c47674
      7780a360
  9. 17 9月, 2020 1 次提交
  10. 16 9月, 2020 5 次提交
    • A
      More robust sync points for intra-L0 compaction tests (#7382) · ec024a86
      Andrew Kryczka 提交于
      Summary:
      `IntraL0CompactionAfterFlushCheckConsistencyFail` was flaky by sometimes failing due to no intra-L0 compactions happening. I was able to repro it by putting a `sleep(1)` in the compaction thread before it grabs the lock and picks a compaction. This also showed other intra-L0 tests are affected too, although some of them exhibit hanging forever rather than failing.
      
      The problem was that all the flushes/ingestions could finish before any compaction got picked, so it would end up simply picking all the files that the test generates for L0->L1. But, these tests intend only the first few files to be picked for L0->L1, and the subsequent files to be picked for intra-L0. This PR adjusts the sync points of all the intra-L0 tests to enforce this.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7382
      
      Test Plan: run all the `db_compaction_test`s with and without the artificial `sleep()`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23684985
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6508399030dddec7738e9853a7b3dc53ef77a584
      ec024a86
    • Y
      Add basic support for user-defined timestamp to db_bench (#7389) · a28df7a7
      Yanqin Jin 提交于
      Summary:
      Update db_bench so that we can run it with user-defined timestamp.
      Currently, only 64-bit timestamp is supported, while others are disabled by assertion.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7389
      
      Test Plan: ./db_bench -benchmarks=fillseq,fillrandom,readrandom,readsequential,....., -user_timestamp_size=8
      
      Reviewed By: ltamasi
      
      Differential Revision: D23720830
      
      Pulled By: riversand963
      
      fbshipit-source-id: 486eacbb82de9a5441e79a61bfa9beef6581608a
      a28df7a7
    • A
      Disable fsync in DB tests with timeouts (#7380) · 9d3b2db9
      Andrew Kryczka 提交于
      Summary:
      Some tests were encountering 600 second timeout in CI, such as `./db_universal_compaction_test --gtest_filter=NumLevels/DBTestUniversalCompaction.UniversalCompactionTrivialMoveTest2/5`, `./db_properties_test --gtest_filter=DBPropertiesTest.AggregatedTablePropertiesAtLevel`, and `./db_basic_test --gtest_filter=DBBasicTest.MultiGetBatchedSortedMultiFile`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7380
      
      Test Plan:
      - `./db_universal_compaction_test --gtest_filter=NumLevels/DBTestUniversalCompaction.UniversalCompactionTrivialMoveTest2/5`: 40 -> 3 seconds
      - `./db_properties_test --gtest_filter=DBPropertiesTest.AggregatedTablePropertiesAtLevel`: 106 -> 1 second
      - `./db_basic_test --gtest_filter=DBBasicTest.MultiGetBatchedSortedMultiFile`: 27 -> 1 second
      
      Reviewed By: anand1976
      
      Differential Revision: D23674570
      
      Pulled By: ajkr
      
      fbshipit-source-id: 4d4ca6a4e2d2e76fcf8b6f6cce91e0f98ba5050c
      9d3b2db9
    • L
      Integrate blob file writing with recovery (#7388) · bf1aeebb
      Levi Tamasi 提交于
      Summary:
      The patch adds support for extracting large values into blob files when
      performing a flush during recovery (when `avoid_flush_during_recovery` is
      `false`). Blob files are built and added to the `Version` similarly to flush.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7388
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23709912
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ce48b4227849cf25429ae98574e72b0e1cb9c67d
      bf1aeebb
    • M
      Changes to EncryptedEnv public API (#7279) · 67bd5401
      mrambacher 提交于
      Summary:
      Cleaned up the public API to use the EncryptedEnv.  This change will allow providers to be developed and added to the system easier in the future.  It will also allow better integration in the future with the OPTIONS file.
      
      - The internal classes were moved out of the public API into an internal "env_encryption_ctr.h" header.  Short-cut constructors were added to provide the original API functionality.
      - The APIs to the constructors were changed to take shared_ptr, rather than raw pointers or references to allow better memory management and alternative implementations.
      - CreateFromString methods were added to allow future expansion to other provider and cipher implementations through a standard API.
      
      Additionally, there was a code duplication in the NewXXXFile methods.  This common code was moved under a templatized function.
      
      A first-pass at structuring the code was made to potentially allow multiple EncryptionProviders in a single EncryptedEnv.  The idea was that different providers may use different cipher keys or different versions/algorithms.  The EncryptedEnv should have some means of picking different providers based on information.  The groundwork was started for this (the use of the provider_ member variable was localized) but the work has not been completed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7279
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D23709440
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 0e845fff0e03a52603eb9672b4ade32d063ff2f2
      67bd5401
  11. 15 9月, 2020 1 次提交
    • L
      Integrate blob file writing with the flush logic (#7345) · b0e78341
      Levi Tamasi 提交于
      Summary:
      The patch adds support for writing blob files during flush by integrating
      `BlobFileBuilder` with the flush logic, most importantly, `BuildTable` and
      `CompactionIterator`. If `enable_blob_files` is set, large values are extracted
      to blob files and replaced with references. The resulting blob files are then
      logged to the MANIFEST as part of the flush job's `VersionEdit` and
      added to the `Version`, similarly to table files. Errors related to writing
      blob files fail the flush, and any blob files written by such jobs are immediately
      deleted (again, similarly to how SST files are handled). In addition, the patch
      extends the logging and statistics around flushes to account for the presence
      of blob files (e.g. `InternalStats::CompactionStats::bytes_written`, which is
      used for calculating write amplification, now considers the blob files as well).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7345
      
      Test Plan: Tested using `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D23506369
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 646885f22dfbe063f650d38a1fedc132f499a159
      b0e78341