1. 05 3月, 2020 2 次提交
    • C
      Skip high levels with no key falling in the range in CompactRange (#6482) · afb97094
      Cheng Chang 提交于
      Summary:
      In CompactRange, if there is no key in memtable falling in the specified range, then flush is skipped.
      This PR extends this skipping logic to SST file levels: it starts compaction from the highest level (starting from L0) that has files with key falling in the specified range, instead of always starts from L0.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6482
      
      Test Plan:
      A new test ManualCompactionTest::SkipLevel is added.
      
      Also updated a test related to statistics of index block cache hit in db_test2, the index cache hit is increased by 1 in this PR because when checking overlap for the key range in L0, OverlapWithLevelIterator will do a seek in the table cache iterator, which will read from the cached index.
      
      Also updated db_compaction_test and db_test to use correct range for full compaction.
      
      Differential Revision: D20251149
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: f822157cf4796972bd5035d9d7178d8dfb7af08b
      afb97094
    • Z
      Introduce FaultInjectionTestFS to test fault File system instead of Env (#6414) · e62fe506
      Zhichao Cao 提交于
      Summary:
      In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.
      
      A set of ErrorHandlerFSTests are introduced for testing
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6414
      
      Test Plan: pass make asan_check, pass error_handler_fs_test.
      
      Differential Revision: D20252421
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
      e62fe506
  2. 04 3月, 2020 1 次提交
    • K
      s/const auto/const auto&/ when doing loop (#6477) · 03dbd11e
      Kefu Chai 提交于
      Summary:
      this silences following warning from clang-11
      ```
      rocksdb/db/db_impl/db_impl_compaction_flush.cc:1040:21: warning: loop variable 'newf' of type 'const std::pair<int, rocksdb::FileMetaData>' creates a copy from type 'const
      std::pair<int\
      , rocksdb::FileMetaData>' [-Wrange-loop-analysis]
          for (const auto newf : c->edit()->GetNewFiles()) {
                          ^
      rocksdb/db/db_impl/db_impl_compaction_flush.cc:1040:10: note: use reference type 'const std::pair<int, rocksdb::FileMetaData> &' to prevent copying
          for (const auto newf : c->edit()->GetNewFiles()) {
               ^~~~~~~~~~~~~~~~~
                          &
      ```
      Signed-off-by: NKefu Chai <tchaikov@gmail.com>
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6477
      
      Differential Revision: D20211850
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 3e89e13a12bba79f1b934d46b7c4c0576cdafb01
      03dbd11e
  3. 03 3月, 2020 3 次提交
    • S
      Fix data race of GetCreationTimeOfOldestFile() (#6473) · 17bef7d3
      sdong 提交于
      Summary:
      When DBImpl::GetCreationTimeOfOldestFile() calls Version::GetCreationTimeOfOldestFile(), the version is not directly or indirectly referenced, so an event like compaction can race with the operation and cause DBImpl::GetCreationTimeOfOldestFile() to access delocated data. This was caught by an ASAN run:
      
      ==268==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000b7d198 at pc 0x000018332913 bp 0x7f391510d310 sp 0x7f391510d308
      READ of size 8 at 0x612000b7d198 thread T845 (store_load-33)
      SCARINESS: 51 (8-byte-read-heap-use-after-free)
          #0 0x18332912 in rocksdb::Version::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/version_set.cc:1488
          https://github.com/facebook/rocksdb/issues/1 0x1803ddaa in rocksdb::DBImpl::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/db_impl/db_impl.cc:4499
          https://github.com/facebook/rocksdb/issues/2 0xe24ca09 in rocksdb::StackableDB::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/utilities/stackable_db.h:392
          ......
      0x612000b7d198 is located 216 bytes inside of 296-byte region [0x612000b7d0c0,0x612000b7d1e8)
      freed by thread T28 here:
          ......
          https://github.com/facebook/rocksdb/issues/5 0x1832c73f in std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> >::~vector() third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/stl_vector.h:435
          https://github.com/facebook/rocksdb/issues/6 0x1832c73f in rocksdb::VersionStorageInfo::~VersionStorageInfo() rocksdb/src/db/version_set.cc:734
          https://github.com/facebook/rocksdb/issues/7 0x1832cf42 in rocksdb::Version::~Version() rocksdb/src/db/version_set.cc:758
          https://github.com/facebook/rocksdb/issues/8 0x9d1bb5 in rocksdb::Version::Unref() rocksdb/src/db/version_set.cc:2869
          https://github.com/facebook/rocksdb/issues/9 0x183e7631 in rocksdb::Compaction::~Compaction() rocksdb/src/db/compaction/compaction.cc:275
          https://github.com/facebook/rocksdb/issues/10 0x9e6de6 in std::default_delete<rocksdb::Compaction>::operator()(rocksdb::Compaction*) const third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:78
          https://github.com/facebook/rocksdb/issues/11 0x9e6de6 in std::unique_ptr<rocksdb::Compaction, std::default_delete<rocksdb::Compaction> >::reset(rocksdb::Compaction*) third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:376
          https://github.com/facebook/rocksdb/issues/12 0x9e6de6 in rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2826
          https://github.com/facebook/rocksdb/issues/13 0x9ac3b8 in rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2320
          https://github.com/facebook/rocksdb/issues/14 0x9abff7 in rocksdb::DBImpl::BGWorkCompaction(void*) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2096
          ......
      
      Fix the issue by reference the super version and use the referenced version from it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6473
      
      Test Plan: Run ASAN for all existing tests.
      
      Differential Revision: D20196416
      
      fbshipit-source-id: 5f4a7918110fc7b8dd7841932d376bc9d1e59d6f
      17bef7d3
    • Z
      Replace Directory with FSDirectory in DB (#6468) · 8d73137a
      Zhichao Cao 提交于
      Summary:
      In the current code base, we can use Directory from Env to manage directory (e.g, Fsync()). The PR https://github.com/facebook/rocksdb/issues/5761  introduce the File System as a new Env API. So we further replace the Directory class in DB with FSDirectory such that we can have more IO information from IOStatus returned by FSDirectory.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6468
      
      Test Plan: pass make asan_check
      
      Differential Revision: D20195261
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 93962cb9436852bfcfb76e086d9e7babd461cbe1
      8d73137a
    • H
      return timestamp from get (#6409) · 904a60ff
      Huisheng Liu 提交于
      Summary:
      Added new Get() methods that return timestamp. Dummy implementation is given so that classes derived from DB don't need to be touched to provide their implementation. MultiGet is not included.
      
      ReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
          base line (commit 72ee067b):
              101.712 micros/op 314602 ops/sec;   36.0 MB/s (5658999 of 5658999 found)
          This PR:
              100.288 micros/op 319071 ops/sec;   36.5 MB/s (5674999 of 5674999 found)
      
      ./db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=readrandom --use_existing_db=1 --num=25000000 --threads=32
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6409
      
      Differential Revision: D20200086
      
      Pulled By: riversand963
      
      fbshipit-source-id: 490edd74d924f62bd8ae9c29c2a6bbbb8410ca50
      904a60ff
  4. 29 2月, 2020 1 次提交
  5. 26 2月, 2020 1 次提交
    • A
      Fix range deletion tombstone ingestion with global seqno (#6429) · 69679e73
      Andrew Kryczka 提交于
      Summary:
      Original author: jeffrey-xiao
      
      If we are writing a global seqno for an ingested file, the range
      tombstone metablock gets accessed and put into the cache during
      ingestion preparation. At the time, the global seqno of the ingested
      file has not yet been determined, so the cached block will not have a
      global seqno. When the file is ingested and we read its range tombstone
      metablock, it will be returned from the cache with no global seqno. In
      that case, we use the actual seqnos stored in the range tombstones,
      which are all zero, so the tombstones cover nothing.
      
      This commit removes global_seqno_ variable from Block. When iterating
      over a block, the global seqno for the block is determined by the
      iterator instead of storing this mutable attribute in Block.
      Additionally, this commit adds a regression test to check that keys are
      deleted when ingesting a file with a global seqno and range deletion
      tombstones.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6429
      
      Differential Revision: D19961563
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5cf777397fa3e452401f0bf0364b0750492487b7
      69679e73
  6. 25 2月, 2020 1 次提交
    • L
      Add blob file state to VersionEdit (#6416) · d87c10c6
      Levi Tamasi 提交于
      Summary:
      BlobDB currently does not keep track of blob files: no records are written to
      the manifest when a blob file is added or removed, and upon opening a database,
      the list of blob files is populated simply based on the contents of the blob directory.
      This means that lost blob files cannot be detected at the moment. We plan to solve
      this issue by making blob files a part of `Version`; as a first step, this patch makes
      it possible to store information about blob files in `VersionEdit`. Currently, this information
      includes blob file number, total number and size of all blobs, and total number and size
      of garbage blobs. However, the format is extensible: new fields can be added in
      both a forward compatible and a forward incompatible manner if needed (similarly
      to `kNewFile4`).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6416
      
      Test Plan: `make check`
      
      Differential Revision: D19894234
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f9753e1f2aedf6dadb70c09b345207cb9c58c329
      d87c10c6
  7. 22 2月, 2020 2 次提交
  8. 21 2月, 2020 2 次提交
    • Y
      Fix MANIFEST name assignment (#6426) · 362b8d43
      Yanqin Jin 提交于
      Summary:
      Currently, a new MANIFEST file is assigned a new file number when 1) no
      MANIFEST is open, or 2) current MANIFEST file size exceeds a threshold. This is
      not sufficient. There are cases when the caller explicitly specifies that a new
      MANIFEST be created. For example, if user sets options.write_dbid_to_manifest = true,
      and there are WAL files, then RocksDB will run into an issue during recovery.
      `DBImpl::Recover()` will call `LogAndApply()` to write dbid. At this point, the db being
      recovered creates a new MANIFEST, say, MANIFEST-000003. Since there are WALs,
      `DBImpl::RecoverLogFiles` will be called. Towards the end of this function, we call
      `LogAndApply(new_descriptor_log=true)`, which explicitly creates a new MANIFEST.
      However, the manifest_file_number is wrong before this fix. Consequently, RocksDB
      opens an existing, non-empty file for append, effectively truncating the file to zero.
      If a crash occurs, then there will be data loss.
      
      Test Plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6426
      
      Test Plan: make check
      
      Differential Revision: D19951866
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4b1b9fc28d4fe2ac12764b388ef9e61f05e766da
      362b8d43
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  9. 19 2月, 2020 2 次提交
  10. 15 2月, 2020 1 次提交
  11. 14 2月, 2020 4 次提交
    • M
      WriteUnPrepared: Fix assertion during recovery (#6419) · 908b1ee6
      Manuel Ung 提交于
      Summary:
      During recovery, multiple (un)prepared batches could exist in the same WAL record due to group commit. This breaks an assertion in `MemTableInserter::MarkBeginPrepare`.
      
      To fix, reset unprepared_batch_ to false after `MarkEndPrepare`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6419
      
      Differential Revision: D19896148
      
      Pulled By: lth
      
      fbshipit-source-id: b1a32ef88f775a0881264a18bd1a4a5b8c85eee3
      908b1ee6
    • S
      Should flush and sync WAL when writing it in DB::Open() (#6417) · ac8e89a4
      sdong 提交于
      Summary:
      A recent fix related to 2pc https://github.com/facebook/rocksdb/pull/6313/ writes something to WAL, but does not flush or sync. This causes assertion failure "impl->TEST_WALBufferIsEmpty()" if manual_wal_flush = true. We should fsync the entry to make sure a second power reset can recover.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6417
      
      Test Plan: Add manual_wal_flush=true case in TransactionTest.DoubleCrashInRecovery and fix a bug in the test so that the bug can be reproduced. It passes with the fix.
      
      Differential Revision: D19894537
      
      fbshipit-source-id: f1e84e49e2269f583c6019743118292cd8b6598e
      ac8e89a4
    • H
      Fix destroydb (#6308) · 5138764e
      Huisheng Liu 提交于
      Summary:
      It's observed on Windows DestroyDB failed to remove the log file because the logger is still alive in sst file manager and holding a handle to the log file. This fix makes sure the logger is released before attempt to clear the database directory.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6308
      
      Differential Revision: D19818829
      
      Pulled By: riversand963
      
      fbshipit-source-id: 54c3e6859aadaaba4a49b3e851b73dc35ec7dc6a
      5138764e
    • C
      Revert usage of Defer. (#6410) · a676001f
      Cheng Chang 提交于
      Summary:
      Seems like this caused the following test failure on AppVeyor:
      DBTest2.CrashInRecoveryMultipleCF
      c:\projects\rocksdb\db\db_test_util.cc(107): error: DestroyDB(dbname_, options)
      IO error: Failed to delete: C:\projects\rocksdb\db_tests\\testrocksdb-3112//db_test2_10791409581227174103/000013.sst: Access is denied.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6410
      
      Test Plan: Wait to see whether the AppVeyor test passes.
      
      Differential Revision: D19879872
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 59a9c55ca88566e9210c0b715ecc45a4fd9afe26
      a676001f
  12. 12 2月, 2020 1 次提交
  13. 11 2月, 2020 5 次提交
    • A
      Revert "Check KeyContext status in MultiGet (#6387)" (#6401) · 35ed530d
      anand76 提交于
      Summary:
      This reverts commit d70011bc. The commit is causing some stress test failure due to unexpected Status::MergeInProgress() return for some keys.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6401
      
      Differential Revision: D19826623
      
      Pulled By: anand1976
      
      fbshipit-source-id: edd634cede9cb7bdd2cb8f46e662ea709b16d2f1
      35ed530d
    • C
      Add utility class Defer (#6382) · dafb5680
      Cheng Chang 提交于
      Summary:
      Add a utility class `Defer` to defer the execution of a function until the Defer object goes out of scope.
      Used in VersionSet:: ProcessManifestWrites as an example.
      The inline comments for class `Defer` have more details.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6382
      
      Test Plan: `make defer_test version_set_test && ./defer_test && ./version_set_test`
      
      Differential Revision: D19797538
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: b1a9b7306e4fd4f48ec2ab55783caa561a315f0f
      dafb5680
    • L
      Do not move VersionEdit into AtomicGroupReadBuffer (#6400) · cbf5f3be
      Levi Tamasi 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/6383 surfaced an issue with
      `VersionSet`/`ReactiveVersionSet` and `AtomicGroupReadBuffer::AddEdit`
      (which was added in https://github.com/facebook/rocksdb/pull/5411):
      `AddEdit` moves the `VersionEdit` passed to it into `replay_buffer_`,
      however, the client `VersionSet` classes keep using it afterwards. This
      *seemed to* work before the refactoring but it really did not: since
      `VersionEdit` used to have a user-declared destructor, no move
      constructor/move assignment operator was generated, and the `move` in
      `AddEdit` was really a copy. The patch makes the copy explicit. Note: it
      should be possible to rework this logic so that we can get away
      with the move but for now, this should fix the issue.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6400
      
      Test Plan:
      `make check`
      `make analyze`
      
      Differential Revision: D19824466
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f38033967daf2a39c78dcd6e12978bafe37632b4
      cbf5f3be
    • Z
      Checksum for each SST file and stores in MANIFEST (#6216) · 4369f2c7
      Zhichao Cao 提交于
      Summary:
      In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.
      
      Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string).  A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216
      
      Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.
      
      Differential Revision: D19171461
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
      4369f2c7
    • Y
      Add error status for no_slowdown & low priority write (#6396) · 2e0159ec
      Yutian Li 提交于
      Summary:
      When `no_slowdown` is enabled, it returns `Status::Incomplete("Write stall")` if a stall would occur. This patch adds descriptive text for when `no_slowdown` and `low_pri` are enabled.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6396
      
      Differential Revision: D19808978
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: a53b0d25ed414c821a086531e0222027f925e627
      2e0159ec
  14. 08 2月, 2020 6 次提交
    • A
      Check KeyContext status in MultiGet (#6387) · d70011bc
      anand76 提交于
      Summary:
      Currently, any IO errors and checksum mismatches while reading data
      blocks, are being ignored by the batched MultiGet. Its only looking at
      the GetContext state. Fix that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6387
      
      Test Plan: Add unit tests
      
      Differential Revision: D19799819
      
      Pulled By: anand1976
      
      fbshipit-source-id: 46133dccbb04e64067b9fe6cda73e282203db969
      d70011bc
    • R
      db/write_thread.cc: Initialize state (#6275) · 4e457278
      Robert Yang 提交于
      Summary:
      Fixed an error when compiled with -Og:
      db/write_thread.cc:183:14: error: 'state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      Signed-off-by: NRobert Yang <liezhi.yang@windriver.com>
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6275
      
      Differential Revision: D19381755
      
      fbshipit-source-id: a90bf3cd4a7248d9d71219e918fc6253deb97e3c
      4e457278
    • L
      Clean up VersionEdit a bit (#6383) · 752c87af
      Levi Tamasi 提交于
      Summary:
      This is a bunch of small improvements to `VersionEdit`. Namely, the patch
      
      * Makes the names and order of variables, methods, and code chunks related
        to the various information elements more consistent, and adds missing
        getters for the sake of completeness.
      * Initializes previously uninitialized stack variables.
      * Marks all getters const to improve const correctness.
      * Adds in-class initializers and removes the default ctor that would
        create an object with uninitialized built-in fields and call `Clear`
        afterwards.
      * Adds a new type alias for new files and changes the existing `typedef`
        for deleted files into a type alias as well.
      * Makes the helper method `DecodeNewFile4From` private.
      * Switches from long-winded iterator syntax to range based loops in a
        couple of places.
      * Fixes a couple of assignments where an integer 0 was assigned to
        boolean members.
      * Fixes a getter which used to return a `const std::string` instead of
      the intended `const std::string&`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6383
      
      Test Plan: make check
      
      Differential Revision: D19780537
      
      Pulled By: ltamasi
      
      fbshipit-source-id: b0b4f09fee0ec0e7c7b7a6d76bfe5346e91824d0
      752c87af
    • C
      Remove outdated comment (#6379) · 5f478b9f
      Cheng Chang 提交于
      Summary:
      Since the logic for handling IDENTITY file is now inside `NewDB`, the comment above `NewDB` is no longer relevant.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6379
      
      Test Plan: not needed
      
      Differential Revision: D19795440
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 0b1cca87ac6d92474701c46aa4c8d4d708bfa19b
      5f478b9f
    • C
      Add status checks during DB::Open (#6380) · 0a74e1b9
      Cheng Chang 提交于
      Summary:
      Several statuses were not checked during DB::Open.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6380
      
      Test Plan: make check
      
      Differential Revision: D19780237
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: c8d189d20344bd1607890dd1449345bda2ef96b9
      0a74e1b9
    • Y
      Atomic flush rollback once on failure (#6385) · f361cedf
      Yanqin Jin 提交于
      Summary:
      Before this fix, atomic flush codepath may hit an assertion failure on a specific failure case.
      If all flush jobs within an atomic flush succeed (they do not write to MANIFEST), but batch writing version edits to MANIFEST fails, then `cfd->imm()->RollbackMemTableFlush()` will be called twice, and the second invocation hits assertion failure `assert(m->flush_in_progress_)` since the first invocation resets the variable `flush_in_progress_` to false already.
      
      Test plan (dev server):
      ```
      ./db_flush_test --gtest_filter=DBAtomicFlushTest/DBAtomicFlushTest.RollbackAfterFailToInstallResults
      make check
      ```
      Both must succeed.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6385
      
      Differential Revision: D19782943
      
      Pulled By: riversand963
      
      fbshipit-source-id: 84e1592625e729d1b70fdc8479959387a74cb121
      f361cedf
  15. 07 2月, 2020 1 次提交
    • C
      Be able to read compatible leveldb sst files (#6370) · f5f79f01
      Cheng Chang 提交于
      Summary:
      In `DBSSTTest.SSTsWithLdbSuffixHandling`, some sst files are renamed to ldb files, the original intention of the test is to test that the ldb files can be loaded along with the sst files.
      
      The original test checks this by `ASSERT_NE("NOT_FOUND", Get(Key(k)))`, but the problem is `Get(Key(k))` returns IO error due to path not found instead of NOT_FOUND, so the success of ASSERT_NE does not mean the key can be retrieved.
      
      This PR updates the test to make sure Get(Key(k)) returns the original value.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6370
      
      Test Plan: make db_sst_test && ./db_sst_test
      
      Differential Revision: D19726278
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 993127f56457b315e669af4eeb92d6f956b7a4b7
      f5f79f01
  16. 05 2月, 2020 2 次提交
    • M
      Avoid lots of calls to Env::GetFileSize() in SstFileManagerImpl when opening DB (#6363) · 1ed7d9b1
      Mike Kolupaev 提交于
      Summary:
      Before this PR it calls GetFileSize() once for each sst file in the DB. This can take a long time if there are be tens of thousands of sst files (e.g. in thousands of column families), and even longer if Env is talking to some remote service rather than local filesystem. This PR makes DB::Open() use sst file sizes that are already known from manifest (typically almost all files in the DB) and only call GetFileSize() for non-sst or obsolete files. Note that GetFileSize() is also called and checked against manifest in CheckConsistency(), so the calls in SstFileManagerImpl were completely redundant.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6363
      
      Test Plan: deployed to a test cluster, looked at a dump of Env calls (from a custom instrumented Env) - no more thousands of GetFileSize()s.
      
      Differential Revision: D19702509
      
      Pulled By: al13n321
      
      fbshipit-source-id: 99f8110620cb2e9d0c092dfcdbb11f3af4ff8b73
      1ed7d9b1
    • S
      Avoid to get manifest file size when recovering from it. (#6369) · 69c86148
      sdong 提交于
      Summary:
      Right now RocksDB gets manifest file size before recovering from it. The information is available in LogReader. Use it instead to prevent one file system call.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6369
      
      Test Plan: Run all existing tests
      
      Differential Revision: D19714872
      
      fbshipit-source-id: 0144be324d403c99e3da875ea2feccc8f64e883d
      69c86148
  17. 04 2月, 2020 5 次提交
    • M
      Add an option to prevent DB::Open() from querying sizes of all sst files (#6353) · 637e64b9
      Mike Kolupaev 提交于
      Summary:
      When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open.
      
      If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes.
      
      We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353
      
      Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble.
      
      Differential Revision: D19656425
      
      Pulled By: al13n321
      
      fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82
      637e64b9
    • A
      Fix a test failure in error_handler_test (#6367) · 7330ec0f
      anand76 提交于
      Summary:
      Fix an intermittent failure in
      DBErrorHandlingTest.CompactionManifestWriteError due to a race between
      background error recovery and the main test thread calling
      TEST_WaitForCompact().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6367
      
      Test Plan: Run the test using gtest_parallel
      
      Differential Revision: D19713802
      
      Pulled By: anand1976
      
      fbshipit-source-id: 29e35dc26e0984fe8334c083e059f4fa1f335d68
      7330ec0f
    • S
      Use ReadFileToString() to get content from IDENTITY file (#6365) · f195d8d5
      sdong 提交于
      Summary:
      Right now when reading IDENTITY file, we use a very similar logic as ReadFileToString() while it does an extra file size check, which may be expensive in some file systems. There is no reason to duplicate the logic. Use ReadFileToString() instead.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6365
      
      Test Plan: RUn all existing tests.
      
      Differential Revision: D19709399
      
      fbshipit-source-id: 3bac31f3b2471f98a0d2694278b41e9cd34040fe
      f195d8d5
    • S
      Avoid create directory for every column families (#6358) · 36c504be
      sdong 提交于
      Summary:
      A relatively recent regression causes for every CF, create and open directory is called for the DB directory, unless CF has a private directory. This doesn't scale well with large number of column families.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6358
      
      Test Plan: Run all existing tests and see it pass. strace with db_bench --num_column_families and observe it doesn't open directory for number of column families.
      
      Differential Revision: D19675141
      
      fbshipit-source-id: da01d9216f1dae3f03d4064fbd88ce71245bd9be
      36c504be
    • H
      Error handler test fix (#6266) · eb4d6af5
      Huisheng Liu 提交于
      Summary:
      MultiDBCompactionError fails when it verifies the number of files on level 0 and level 1 without waiting for compaction to finish.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6266
      
      Differential Revision: D19701639
      
      Pulled By: riversand963
      
      fbshipit-source-id: e96d511bcde705075f073e0b550cebcd2ecfccdc
      eb4d6af5