1. 23 10月, 2020 1 次提交
  2. 03 10月, 2020 1 次提交
  3. 02 10月, 2020 2 次提交
  4. 30 9月, 2020 1 次提交
  5. 18 9月, 2020 1 次提交
  6. 16 9月, 2020 1 次提交
    • L
      Integrate blob file writing with recovery (#7388) · bf1aeebb
      Levi Tamasi 提交于
      Summary:
      The patch adds support for extracting large values into blob files when
      performing a flush during recovery (when `avoid_flush_during_recovery` is
      `false`). Blob files are built and added to the `Version` similarly to flush.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7388
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23709912
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ce48b4227849cf25429ae98574e72b0e1cb9c67d
      bf1aeebb
  7. 15 9月, 2020 2 次提交
    • L
      Integrate blob file writing with the flush logic (#7345) · b0e78341
      Levi Tamasi 提交于
      Summary:
      The patch adds support for writing blob files during flush by integrating
      `BlobFileBuilder` with the flush logic, most importantly, `BuildTable` and
      `CompactionIterator`. If `enable_blob_files` is set, large values are extracted
      to blob files and replaced with references. The resulting blob files are then
      logged to the MANIFEST as part of the flush job's `VersionEdit` and
      added to the `Version`, similarly to table files. Errors related to writing
      blob files fail the flush, and any blob files written by such jobs are immediately
      deleted (again, similarly to how SST files are handled). In addition, the patch
      extends the logging and statistics around flushes to account for the presence
      of blob files (e.g. `InternalStats::CompactionStats::bytes_written`, which is
      used for calculating write amplification, now considers the blob files as well).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7345
      
      Test Plan: Tested using `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D23506369
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 646885f22dfbe063f650d38a1fedc132f499a159
      b0e78341
    • M
      Bring the Configurable options together (#5753) · 7d472acc
      mrambacher 提交于
      Summary:
      This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753
      
      Reviewed By: ajkr
      
      Differential Revision: D23385030
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a
      7d472acc
  8. 09 9月, 2020 1 次提交
    • A
      Store FSWritableFilePtr object in WritableFileWriter (#7193) · b175eceb
      Akanksha Mahajan 提交于
      Summary:
      Replace FSWritableFile pointer with FSWritableFilePtr
          object in WritableFileWriter.
          This new object wraps FSWritableFile pointer.
      
          Objective: If tracing is enabled, FSWritableFile Ptr returns
          FSWritableFileTracingWrapper pointer that includes all necessary
          information in IORecord and calls underlying FileSystem and invokes
          IOTracer to dump that record in a binary file. If tracing is disabled
          then, underlying FileSystem pointer is returned directly.
          FSWritableFilePtr wrapper class is added to bypass the
          FSWritableFileWrapper when
          tracing is disabled.
      
          Test Plan: make check -j64
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7193
      
      Reviewed By: anand1976
      
      Differential Revision: D23355915
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e62a27a13c1fd77e36a6dbafc7006d969bed25cf
      b175eceb
  9. 25 8月, 2020 1 次提交
  10. 21 8月, 2020 1 次提交
  11. 19 8月, 2020 1 次提交
    • A
      Store FSSequentialFilePtr object in SequenceFileReader (#7190) · cc24ac14
      Akanksha Mahajan 提交于
      Summary:
      This diff contains following changes:
          1. Replace `FSSequentialFile` pointer with `FSSequentialFilePtr` object that wraps `FSSequentialFile` pointer in `SequenceFileReader`.
      
      Objective: If tracing is enabled, `FSSequentialFilePtr` returns `FSSequentialFileTracingWrapper` pointer that includes all necessary information in `IORecord` and calls underlying FileSystem and invokes `IOTracer` to dump that record in a binary file. If tracing is disabled then, underlying `FileSystem` pointer is returned directly. `FSSequentialFilePtr` wrapper class is added to bypass the `FSSequentialFileTracingWrapper` when tracing is disabled.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7190
      
      Test Plan:
      make check -j64
                COMPILE_WITH_TSAN=1 make check -j64
      
      Reviewed By: anand1976
      
      Differential Revision: D23059616
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 1564b94dd1297cd0fbfe2ed5c9cc3e20f7395301
      cc24ac14
  12. 18 8月, 2020 1 次提交
    • A
      Disable `recycle_log_file_num` with `kTolerateCorruptedTailRecords` (#7271) · 5d5ff824
      Andrew Kryczka 提交于
      Summary:
      The two features are naturally incompatible. WAL recycling expects the recovery to succeed upon encountering a corrupt record at the point where new data ends and recycled data remains at the tail. However, `WALRecoveryMode::kTolerateCorruptedTailRecords` must fail upon encountering any such corrupt record, as it cannot differentiate between this and a real corruption, which would cause committed updates to be truncated.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7271
      
      Reviewed By: riversand963
      
      Differential Revision: D23169923
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2cf8a3bcd2c9a0ecb0055a84725047a10fd4db50
      5d5ff824
  13. 15 8月, 2020 1 次提交
    • J
      Introduce a global StatsDumpScheduler for stats dumping (#7223) · 69760b4d
      Jay Zhuang 提交于
      Summary:
      Have a global StatsDumpScheduler for all DB instance stats dumping, including `DumpStats()` and `PersistStats()`. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7223
      
      Reviewed By: riversand963
      
      Differential Revision: D23056737
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0faa2311142a73433ebb3317361db7cbf43faeba
      69760b4d
  14. 07 8月, 2020 1 次提交
  15. 16 7月, 2020 1 次提交
    • Z
      Auto resume the DB from Retryable IO Error (#6765) · a10f12ed
      Zhichao Cao 提交于
      Summary:
      In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765
      
      Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check
      
      Reviewed By: anand1976
      
      Differential Revision: D21916789
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b
      a10f12ed
  16. 11 7月, 2020 1 次提交
    • W
      Reduce `env_->GetChildren()` calls in DBImpl::Recover() (#7044) · 4924a506
      wenh 提交于
      Summary:
      There currently exist multiple `GetChildren()` calls in `DBImpl::Recover()`, which can be expensive in cases of distributed file systems.
      This pull request try to call `DBImpl::Recover()` of each necessary directory only _once_ and reuse the results in the places of repeated calls in current code.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7044
      
      Test Plan:
      Run `make check` and use the default test suite. The modified code should be semantically identical to the current code. As a proof of this solution, we may optionally deploy the system onto a (real or simulated) distributed system and expect reduced latency caused by manifest fetching.
      
      (WIP)
      
      Reviewed By: riversand963
      
      Differential Revision: D22419925
      
      Pulled By: roghnin
      
      fbshipit-source-id: d3774fbfbc246c5527101bc16747eb5c90919886
      4924a506
  17. 18 6月, 2020 1 次提交
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  18. 16 6月, 2020 1 次提交
    • Y
      Let best-efforts recovery ignore CURRENT file (#6970) · 9bfd46d0
      Yanqin Jin 提交于
      Summary:
      Best-efforts recovery does not check the content of CURRENT file to determine which MANIFEST to recover from. However, it still checks the presence of CURRENT file to determine whether to create a new DB during `open()`. Therefore, we can tweak the logic in `open()` a little bit so that best-efforts recovery does not rely on CURRENT file at all.
      
      Test plan (dev server):
      make check
      ./db_basic_test --gtest_filter=DBBasicTest.RecoverWithNoCurrentFile
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6970
      
      Reviewed By: anand1976
      
      Differential Revision: D22013990
      
      Pulled By: riversand963
      
      fbshipit-source-id: db552a1868c60ed70e1f7cd252a3a076eb8ea58f
      9bfd46d0
  19. 12 6月, 2020 1 次提交
    • Y
      Fail point-in-time WAL recovery upon IOError reading WAL (#6963) · 717749f4
      Yanqin Jin 提交于
      Summary:
      If `options.wal_recovery_mode == WALRecoveryMode::kPointInTimeRecovery`, RocksDB stops replaying WAL once hitting an error and discards the rest of the WAL. This can lead to data loss if the error occurs at an offset smaller than the last sync'ed offset.
      Ideally, RocksDB point-in-time recovery should permit recovery if the error occurs after last synced offset while fail recovery if error occurs before the last synced offset. However, RocksDB does not track the synced offset of WALs. Consequently, RocksDB does not know whether an error occurs before or after the last synced offset. An error can be one of the following.
      - WAL record checksum mismatch. This can result from both corruption of synced data and dropping of unsynced data during shutdown. We cannot be sure which one. In order not to defeat the original motivation to permit the latter case, we keep the original behavior of point-in-time WAL recovery.
      - IOError. This means the WAL can be bad, an indicator of whole file becoming unavailable, not to mention synced part of the WAL. Therefore, we choose to modify the behavior of point-in-time recovery and fail the database recovery.
      
      Test plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6963
      
      Reviewed By: ajkr
      
      Differential Revision: D22011083
      
      Pulled By: riversand963
      
      fbshipit-source-id: f9cbf29a37dc5cc40d3fa62f89eed1ad67ca1536
      717749f4
  20. 06 6月, 2020 1 次提交
  21. 09 5月, 2020 1 次提交
    • Y
      Fix a few bugs in best-efforts recovery (#6824) · e72e2167
      Yanqin Jin 提交于
      Summary:
      1. Update column_family_memtables_ to point to latest column_family_set in
         version_set after recovery.
      2. Normalize file paths passed by application so that directories end with '/'
         or '\\'.
      3. In addition to missing files, corrupted files are also ignored in
         best-efforts recovery.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6824
      
      Test Plan: COMPILE_WITH_ASAN=1 make check
      
      Reviewed By: anand1976
      
      Differential Revision: D21463905
      
      Pulled By: riversand963
      
      fbshipit-source-id: c48db8843cc93c8c1c7139c474b64e6f775307d2
      e72e2167
  22. 24 4月, 2020 1 次提交
  23. 01 4月, 2020 1 次提交
    • S
      Make options.bottommost_compression, compression_opts and... · 80979f81
      sdong 提交于
      Make options.bottommost_compression, compression_opts and bottommost_compression_opts dynamically changeable. (#6615)
      
      Summary:
      These three options should be made dynamically changeable. Simply add them to MutableCFOptions and made the change.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6615
      
      Test Plan: Add a unit test to make sure that SetOptions() can change the options.
      
      Reviewed By: riversand963
      
      Differential Revision: D20755951
      
      fbshipit-source-id: 8165f4fd7a7a665cc7fb049698935022a5d2e7ff
      80979f81
  24. 28 3月, 2020 1 次提交
    • Z
      Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487) · 42468881
      Zhichao Cao 提交于
      Summary:
      In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
      
      The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
      
      Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
      
      Reviewed By: anand1976
      
      Differential Revision: D20685017
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
      42468881
  25. 24 3月, 2020 1 次提交
    • A
      Simplify migration to FileSystem API (#6552) · a9d168cf
      anand76 提交于
      Summary:
      The current Env/FileSystem API separation has a couple of issues -
      1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
      2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.
      
      This PR solves the above issues and simplifies the migration in the following ways -
      1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
      1a. - This also makes it more robust by ensuring that even if RocksDB
        has some stray calls to Env APIs rather than FileSystem, they will go
        through the same object and thus there is no risk of getting out of
        sync.
      2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
      PosixEnv with a custom FileSystem implementation. This eliminates an
      indirection to call Env APIs, and relieves the FileSystem developer of
      the burden of having to implement wrappers for the Env APIs.
      3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
      ```NewLogger()```
      
      Tests:
      1. New unit tests
      2. make check and make asan_check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552
      
      Reviewed By: riversand963
      
      Differential Revision: D20592038
      
      Pulled By: anand1976
      
      fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
      a9d168cf
  26. 21 3月, 2020 1 次提交
    • Y
      Attempt to recover from db with missing table files (#6334) · fb09ef05
      Yanqin Jin 提交于
      Summary:
      There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status.
      This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version.
      `DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will *not* be performed.
      To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush.
      
      Test plan (on devserver):
      ```
      $make check
      $COMPILE_WITH_ASAN=1 make all && make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334
      
      Reviewed By: anand1976
      
      Differential Revision: D19778960
      
      Pulled By: riversand963
      
      fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8
      fb09ef05
  27. 12 3月, 2020 1 次提交
    • C
      Cache result of GetLogicalBufferSize in Linux (#6457) · 2d9efc9a
      Cheng Chang 提交于
      Summary:
      In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down.
      
      This PR introduces two new APIs:
      1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes.
      2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes.
      
      Other modifications:
      1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms.
      2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`.
      3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457
      
      Test Plan:
      1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`.
      2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`.
      
      `make check`
      
      Differential Revision: D20131243
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04
      2d9efc9a
  28. 03 3月, 2020 1 次提交
  29. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  30. 19 2月, 2020 1 次提交
    • A
      Fix concurrent full purge and WAL recycling (#5900) · c6abe30e
      Andrew Kryczka 提交于
      Summary:
      We were removing the file from `log_recycle_files_` before renaming it
      with `ReuseWritableFile()`. Since `ReuseWritableFile()` occurs outside
      the DB mutex, it was possible for a concurrent full purge to sneak in
      and delete the file before it could be renamed. Consequently, `SwitchMemtable()`
      would fail and the DB would enter read-only mode.
      
      The fix is to hold the old file number in `log_recycle_files_` until
      after the file has been renamed. Full purge uses that list to decide
      which files to keep, so it can no longer delete a file pending recycling.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5900
      
      Test Plan: new unit test
      
      Differential Revision: D19771719
      
      Pulled By: ajkr
      
      fbshipit-source-id: 094346349ca3fb499712e62de03905acc30b5ce8
      c6abe30e
  31. 14 2月, 2020 1 次提交
  32. 11 2月, 2020 1 次提交
    • Z
      Checksum for each SST file and stores in MANIFEST (#6216) · 4369f2c7
      Zhichao Cao 提交于
      Summary:
      In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.
      
      Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string).  A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216
      
      Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.
      
      Differential Revision: D19171461
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
      4369f2c7
  33. 08 2月, 2020 2 次提交
  34. 05 2月, 2020 1 次提交
    • M
      Avoid lots of calls to Env::GetFileSize() in SstFileManagerImpl when opening DB (#6363) · 1ed7d9b1
      Mike Kolupaev 提交于
      Summary:
      Before this PR it calls GetFileSize() once for each sst file in the DB. This can take a long time if there are be tens of thousands of sst files (e.g. in thousands of column families), and even longer if Env is talking to some remote service rather than local filesystem. This PR makes DB::Open() use sst file sizes that are already known from manifest (typically almost all files in the DB) and only call GetFileSize() for non-sst or obsolete files. Note that GetFileSize() is also called and checked against manifest in CheckConsistency(), so the calls in SstFileManagerImpl were completely redundant.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6363
      
      Test Plan: deployed to a test cluster, looked at a dump of Env calls (from a custom instrumented Env) - no more thousands of GetFileSize()s.
      
      Differential Revision: D19702509
      
      Pulled By: al13n321
      
      fbshipit-source-id: 99f8110620cb2e9d0c092dfcdbb11f3af4ff8b73
      1ed7d9b1
  35. 04 2月, 2020 2 次提交
    • M
      Add an option to prevent DB::Open() from querying sizes of all sst files (#6353) · 637e64b9
      Mike Kolupaev 提交于
      Summary:
      When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open.
      
      If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes.
      
      We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353
      
      Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble.
      
      Differential Revision: D19656425
      
      Pulled By: al13n321
      
      fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82
      637e64b9
    • S
      Avoid create directory for every column families (#6358) · 36c504be
      sdong 提交于
      Summary:
      A relatively recent regression causes for every CF, create and open directory is called for the DB directory, unless CF has a private directory. This doesn't scale well with large number of column families.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6358
      
      Test Plan: Run all existing tests and see it pass. strace with db_bench --num_column_families and observe it doesn't open directory for number of column families.
      
      Differential Revision: D19675141
      
      fbshipit-source-id: da01d9216f1dae3f03d4064fbd88ce71245bd9be
      36c504be
  36. 31 1月, 2020 1 次提交
    • M
      Disable recycle_log_file_num when it is incompatible with recovery mode (#6351) · 3316d292
      Maysam Yabandeh 提交于
      Summary:
      Non-zero recycle_log_file_num is incompatible with kPointInTimeRecovery and kAbsoluteConsistency recovery modes. Currently SanitizeOptions changes the recovery mode to kTolerateCorruptedTailRecords, while to resolve this option conflict it makes more sense to compromise recycle_log_file_num, which is a performance feature, instead of wal_recovery_mode, which is a safety feature.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6351
      
      Differential Revision: D19648931
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: dd0bf78349edc007518a00c4d63931fd69294ad7
      3316d292