1. 18 3月, 2021 1 次提交
    • A
      Use SST file manager to track blob files as well (#8037) · 27d57a03
      Akanksha Mahajan 提交于
      Summary:
      Extend support to track blob files in SST File manager.
       This PR notifies SstFileManager whenever a new blob file is created,
       via OnAddFile and  an obsolete blob file deleted via OnDeleteFile
       and delete file via ScheduleFileDeletion.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8037
      
      Test Plan: Add new unit tests
      
      Reviewed By: ltamasi
      
      Differential Revision: D26891237
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 04c69ccfda2a73782fd5c51982dae58dd11979b6
      27d57a03
  2. 15 3月, 2021 1 次提交
    • M
      Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) · 3dff28cf
      mrambacher 提交于
      Summary:
      For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>.  The shared ptr has some performance degradation on certain hardware classes.
      
      For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere.  For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it.  The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource.
      
      There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold.  In those cases, the shared pointer was preserved.
      
      Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17:
      
      6.17: readrandom   :      28.046 micros/op 854902 ops/sec;   61.3 MB/s (355999 of 355999 found)
      6.18: readrandom   :      32.615 micros/op 735306 ops/sec;   52.7 MB/s (290999 of 290999 found)
      PR: readrandom   :      27.500 micros/op 871909 ops/sec;   62.5 MB/s (367999 of 367999 found)
      
      (Note that the times for 6.18 are prior to revert of the SystemClock).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033
      
      Reviewed By: pdillinger
      
      Differential Revision: D27014563
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67
      3dff28cf
  3. 03 3月, 2021 1 次提交
    • L
      Break down the amount of data written during flushes/compactions per file type (#8013) · a46f080c
      Levi Tamasi 提交于
      Summary:
      The patch breaks down the "bytes written" (as well as the "number of output files")
      compaction statistics into two, so the values are logged separately for table files
      and blob files in the info log, and are shown in separate columns (`Write(GB)` for table
      files, `Wblob(GB)` for blob files) when the compaction statistics are dumped.
      This will also come in handy for fixing the write amplification statistics, which currently
      do not consider the amount of data read from blob files during compaction. (This will
      be fixed by an upcoming patch.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8013
      
      Test Plan: Ran `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D26742156
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 31d18ee8f90438b438ca7ed1ea8cbd92114442d5
      a46f080c
  4. 26 2月, 2021 1 次提交
    • Y
      Compaction filter support for (new) BlobDB (#7974) · cef4a6c4
      Yanqin Jin 提交于
      Summary:
      Allow applications to implement a custom compaction filter and pass it to BlobDB.
      
      The compaction filter's custom logic can operate on blobs.
      To do so, application needs to subclass `CompactionFilter` abstract class and implement `FilterV2()` method.
      Optionally, a method called `ShouldFilterBlobByKey()` can be implemented if application's custom logic rely solely
      on the key to make a decision without reading the blob, thus saving extra IO. Examples can be found in
      db/blob/db_blob_compaction_test.cc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7974
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D26509280
      
      Pulled By: riversand963
      
      fbshipit-source-id: 59f9ae5614c4359de32f4f2b16684193cc537b39
      cef4a6c4
  5. 11 2月, 2021 1 次提交
    • Z
      Handoff checksum Implementation (#7523) · d1c510ba
      Zhichao Cao 提交于
      Summary:
      in PR https://github.com/facebook/rocksdb/issues/7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7523
      
      Test Plan: add new unit test, pass make check/ make asan_check
      
      Reviewed By: pdillinger
      
      Differential Revision: D24313271
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
      d1c510ba
  6. 28 1月, 2021 1 次提交
    • L
      Accumulate blob file additions in VersionEdit during recovery (#7903) · c696f274
      Levi Tamasi 提交于
      Summary:
      During recovery, RocksDB performs a kind of dummy flush; namely, entries
      from the WAL are added to memtables, which then get written to SSTs and
      blob files (if enabled) just like during a regular flush. Note that
      multiple memtables might be flushed during recovery for the same column
      family, for example, if the DB is reopened with a lower write buffer size,
      and therefore, we need to make sure to collect all SST and blob file
      additions. The patch fixes a bug in the earlier logic which resulted in
      later blob file additions overwriting earlier ones.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7903
      
      Test Plan: Added a unit test and ran `db_stress`.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26110847
      
      Pulled By: ltamasi
      
      fbshipit-source-id: eddb50a608a88f54f3cec3a423de8235aba951fd
      c696f274
  7. 26 1月, 2021 1 次提交
    • M
      Add a SystemClock class to capture the time functions of an Env (#7858) · 12f11373
      mrambacher 提交于
      Summary:
      Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
      
      Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
      
      There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
      
      Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
      
      Reviewed By: pdillinger
      
      Differential Revision: D26006406
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
      12f11373
  8. 24 12月, 2020 1 次提交
    • M
      No elide constructors (#7798) · 55e99688
      mrambacher 提交于
      Summary:
      Added "no-elide-constructors to the ASSERT_STATUS_CHECK builds.  This flag gives more errors/warnings for some of the Status checks where an inner class checks a Status and later returns it.  In this case,  without the elide check on, the returned status may not have been checked in the caller, thereby bypassing the checked code.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7798
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D25680451
      
      Pulled By: pdillinger
      
      fbshipit-source-id: c3f14ed9e2a13f0a8c54d839d5fb4d1fc1e93917
      55e99688
  9. 23 12月, 2020 1 次提交
  10. 10 12月, 2020 1 次提交
    • A
      Add further tests to ASSERT_STATUS_CHECKED (2) (#7698) · 8ff6557e
      Adam Retter 提交于
      Summary:
      Second batch of adding more tests to ASSERT_STATUS_CHECKED.
      
      * external_sst_file_basic_test
      * checkpoint_test
      * db_wal_test
      * db_block_cache_test
      * db_logical_block_size_cache_test
      * db_blob_index_test
      * optimistic_transaction_test
      * transaction_test
      * point_lock_manager_test
      * write_prepared_transaction_test
      * write_unprepared_transaction_test
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7698
      
      Reviewed By: cheng-chang
      
      Differential Revision: D25441664
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 9e78867f32321db5d4833e95eb96c5734526ef00
      8ff6557e
  11. 09 12月, 2020 1 次提交
    • C
      Do not track obsolete WALs in MANIFEST even if they are synced (#7725) · 07030c6f
      Cheng Chang 提交于
      Summary:
      Consider the case:
      1. All column families are flushed, so all WALs become obsolete, but no WAL is removed from disk yet because the removal is asynchronous, a VersionEdit is written to MANIFEST indicating that WALs before a certain WAL number are obsolete, let's say this number is 3;
      2. `SyncWAL` is called, so all the on-disk WALs are synced, and if track_and_verify_wal_in_manifest=true, the WALs will be tracked in MANIFEST, let's say the WAL numbers are 1 and 2;
      3. DB crashes;
      4. During DB recovery, when replaying MANIFEST, we first see that WAL with number < 3 are obsolete, then we see that WAL 1 and 2 are synced, so according to current implementation of `WalSet`, the `WalSet` will be recovered to include WAL 1 and 2;
      5. WAL 1 and 2 are asynchronously deleted from disk, then the WAL verification algorithm fails with `Corruption: missing WAL`.
      
      The above case is reproduced in a new unit test `DBBasicTestTrackWal::DoNotTrackObsoleteWal`.
      
      The fix is to maintain the upper bound of the obsolete WAL numbers, any WAL with number less than the maintained number is considered to be obsolete, so shouldn't be tracked even if they are later synced. The number is maintained in `WalSet`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7725
      
      Test Plan:
      1. a new unit test `DBBasicTestTrackWal::DoNotTrackObsoleteWal` is added.
      2. run `make crash_test` on devserver.
      
      Reviewed By: riversand963
      
      Differential Revision: D25238914
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: f5dccd57c3d89f19565ec5731f2d42f06d272b72
      07030c6f
  12. 13 11月, 2020 1 次提交
    • Y
      Add full_history_ts_low_ to FlushJob (#7655) · 76ef894f
      Yanqin Jin 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/7556 enables `CompactionIterator` to perform garbage collection during compaction according
      to a lower bound (user-defined) timestamp `full_history_ts_low_`.
      This PR adds a data member `full_history_ts_low_` of type `std::string` to `FlushJob`, and
      `full_history_ts_low_` does not change during flush. `FlushJob` will pass a pointer to this data member
      to the `CompactionIterator` used during flush.
      
      Also refactored flush_job_test.cc to re-use some existing code, which is actually the majority of this PR.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7655
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D24933340
      
      Pulled By: riversand963
      
      fbshipit-source-id: 2e584bfd0cf6e5c295ab1af264e68e9d6a12fca3
      76ef894f
  13. 10 11月, 2020 1 次提交
    • C
      Track WAL in MANIFEST: Track deleted WALs in MANIFEST after recovering from the WALs (#7649) · c3911f1a
      Cheng Chang 提交于
      Summary:
      After replaying the WALs, the memtables are flushed synchronously to L0 instead of being flushed in background. Currently, we only track WAL obsoletion events in the code path of background flush jobs. This PR tracks these events in RecoverLogFiles.
      
      After this change, we can enable `track_and_verify_wal_in_manifest` in `db_stress`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7649
      
      Test Plan: `python tools/db_crashtest.py whitebox`
      
      Reviewed By: riversand963
      
      Differential Revision: D24824501
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 207129f7b845c50b333680ce6818a68a2fad54b9
      c3911f1a
  14. 08 11月, 2020 1 次提交
    • C
      Fix a recovery corner case (#7621) · 5e794b08
      Cheng Chang 提交于
      Summary:
      Consider the following sequence of events:
      
      1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
      2. Syncing MANIFEST failed and db crashed.
      3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
      4. Db crashed again.
      5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.
      
      It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.
      
      After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621
      
      Test Plan:
      1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
      2. a new unit test is added in db_basic_test to simulate step 3.
      
      Reviewed By: riversand963
      
      Differential Revision: D24668144
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
      5e794b08
  15. 07 11月, 2020 1 次提交
    • C
      Track WAL in MANIFEST: LogAndApply WAL events to MANIFEST (#7601) · 1e40696d
      Cheng Chang 提交于
      Summary:
      When a WAL is synced, an edit is written to MANIFEST.
      After flushing memtables, the obsoleted WALs are piggybacked to MANIFEST while writing the new L0 files to MANIFEST.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7601
      
      Test Plan:
      `track_and_verify_wals_in_manifest` is enabled by default for all tests extending `DBBasicTest`, and in db_stress_test.
      Unit test `wal_edit_test`, `version_edit_test`, and `version_set_test` are also updated.
      Watch all tests to pass.
      
      Reviewed By: ltamasi
      
      Differential Revision: D24553957
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 66a569ff1bdced38e22900bd240b73113906e040
      1e40696d
  16. 27 10月, 2020 1 次提交
    • L
      Integrate BlobFileBuilder into the compaction process (#7573) · a7a04b68
      Levi Tamasi 提交于
      Summary:
      Similarly to how https://github.com/facebook/rocksdb/issues/7345
      integrated blob file writing into the flush process,
      the patch adds support for writing blob files to the compaction logic.
      Namely, if `enable_blob_files` is set, large values encountered during
      compaction are extracted to blob files and replaced with blob indexes.
      The resulting blob files are then logged to the MANIFEST as part of the
      compaction job's `VersionEdit` and added to the `Version` alongside any
      table files written by the compaction. Any errors during blob file building fail
      the compaction job.
      
      There will be a separate follow-up patch to perform blob garbage collection
      during compactions.
      
      In addition, the patch continues to chip away at the mess around computing
      various compaction related statistics by eliminating some code duplication
      and by making the `num_output_files` and `bytes_written` stats more consistent
      for flushes, compactions, and recovery.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7573
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D24404696
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 21216af3a172ad3ce8f85d11cd30923784ae426c
      a7a04b68
  17. 24 10月, 2020 1 次提交
    • C
      Track WAL in MANIFEST: persist WALs to and recover WALs from MANIFEST (#7256) · 1b224324
      Cheng Chang 提交于
      Summary:
      This PR makes it able to `LogAndApply` `VersionEdit`s related to WALs, and also be able to `Recover` from MANIFEST with WAL related `VersionEdit`s.
      
      The `VersionEdit`s related to WAL are treated similarly as those related to column family operations, they are not applied to versions, but can be in a commit group. Mixing WAL related `VersionEdit`s with other types of edits will make logic in `ProcessManifestWrite` more complicated, so `VersionEdit`s related to WAL can either be WAL additions or deletions, like column family add and drop.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7256
      
      Test Plan: a set of unit tests are added in `version_set_test.cc`
      
      Reviewed By: riversand963
      
      Differential Revision: D23123238
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 246be2ed4744fd03fa2738aba408aaa611d0379c
      1b224324
  18. 23 10月, 2020 1 次提交
  19. 03 10月, 2020 1 次提交
  20. 02 10月, 2020 2 次提交
  21. 30 9月, 2020 1 次提交
  22. 18 9月, 2020 1 次提交
  23. 16 9月, 2020 1 次提交
    • L
      Integrate blob file writing with recovery (#7388) · bf1aeebb
      Levi Tamasi 提交于
      Summary:
      The patch adds support for extracting large values into blob files when
      performing a flush during recovery (when `avoid_flush_during_recovery` is
      `false`). Blob files are built and added to the `Version` similarly to flush.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7388
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23709912
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ce48b4227849cf25429ae98574e72b0e1cb9c67d
      bf1aeebb
  24. 15 9月, 2020 2 次提交
    • L
      Integrate blob file writing with the flush logic (#7345) · b0e78341
      Levi Tamasi 提交于
      Summary:
      The patch adds support for writing blob files during flush by integrating
      `BlobFileBuilder` with the flush logic, most importantly, `BuildTable` and
      `CompactionIterator`. If `enable_blob_files` is set, large values are extracted
      to blob files and replaced with references. The resulting blob files are then
      logged to the MANIFEST as part of the flush job's `VersionEdit` and
      added to the `Version`, similarly to table files. Errors related to writing
      blob files fail the flush, and any blob files written by such jobs are immediately
      deleted (again, similarly to how SST files are handled). In addition, the patch
      extends the logging and statistics around flushes to account for the presence
      of blob files (e.g. `InternalStats::CompactionStats::bytes_written`, which is
      used for calculating write amplification, now considers the blob files as well).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7345
      
      Test Plan: Tested using `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D23506369
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 646885f22dfbe063f650d38a1fedc132f499a159
      b0e78341
    • M
      Bring the Configurable options together (#5753) · 7d472acc
      mrambacher 提交于
      Summary:
      This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753
      
      Reviewed By: ajkr
      
      Differential Revision: D23385030
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a
      7d472acc
  25. 09 9月, 2020 1 次提交
    • A
      Store FSWritableFilePtr object in WritableFileWriter (#7193) · b175eceb
      Akanksha Mahajan 提交于
      Summary:
      Replace FSWritableFile pointer with FSWritableFilePtr
          object in WritableFileWriter.
          This new object wraps FSWritableFile pointer.
      
          Objective: If tracing is enabled, FSWritableFile Ptr returns
          FSWritableFileTracingWrapper pointer that includes all necessary
          information in IORecord and calls underlying FileSystem and invokes
          IOTracer to dump that record in a binary file. If tracing is disabled
          then, underlying FileSystem pointer is returned directly.
          FSWritableFilePtr wrapper class is added to bypass the
          FSWritableFileWrapper when
          tracing is disabled.
      
          Test Plan: make check -j64
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7193
      
      Reviewed By: anand1976
      
      Differential Revision: D23355915
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e62a27a13c1fd77e36a6dbafc7006d969bed25cf
      b175eceb
  26. 25 8月, 2020 1 次提交
  27. 21 8月, 2020 1 次提交
  28. 19 8月, 2020 1 次提交
    • A
      Store FSSequentialFilePtr object in SequenceFileReader (#7190) · cc24ac14
      Akanksha Mahajan 提交于
      Summary:
      This diff contains following changes:
          1. Replace `FSSequentialFile` pointer with `FSSequentialFilePtr` object that wraps `FSSequentialFile` pointer in `SequenceFileReader`.
      
      Objective: If tracing is enabled, `FSSequentialFilePtr` returns `FSSequentialFileTracingWrapper` pointer that includes all necessary information in `IORecord` and calls underlying FileSystem and invokes `IOTracer` to dump that record in a binary file. If tracing is disabled then, underlying `FileSystem` pointer is returned directly. `FSSequentialFilePtr` wrapper class is added to bypass the `FSSequentialFileTracingWrapper` when tracing is disabled.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7190
      
      Test Plan:
      make check -j64
                COMPILE_WITH_TSAN=1 make check -j64
      
      Reviewed By: anand1976
      
      Differential Revision: D23059616
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 1564b94dd1297cd0fbfe2ed5c9cc3e20f7395301
      cc24ac14
  29. 18 8月, 2020 1 次提交
    • A
      Disable `recycle_log_file_num` with `kTolerateCorruptedTailRecords` (#7271) · 5d5ff824
      Andrew Kryczka 提交于
      Summary:
      The two features are naturally incompatible. WAL recycling expects the recovery to succeed upon encountering a corrupt record at the point where new data ends and recycled data remains at the tail. However, `WALRecoveryMode::kTolerateCorruptedTailRecords` must fail upon encountering any such corrupt record, as it cannot differentiate between this and a real corruption, which would cause committed updates to be truncated.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7271
      
      Reviewed By: riversand963
      
      Differential Revision: D23169923
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2cf8a3bcd2c9a0ecb0055a84725047a10fd4db50
      5d5ff824
  30. 15 8月, 2020 1 次提交
    • J
      Introduce a global StatsDumpScheduler for stats dumping (#7223) · 69760b4d
      Jay Zhuang 提交于
      Summary:
      Have a global StatsDumpScheduler for all DB instance stats dumping, including `DumpStats()` and `PersistStats()`. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7223
      
      Reviewed By: riversand963
      
      Differential Revision: D23056737
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0faa2311142a73433ebb3317361db7cbf43faeba
      69760b4d
  31. 07 8月, 2020 1 次提交
  32. 16 7月, 2020 1 次提交
    • Z
      Auto resume the DB from Retryable IO Error (#6765) · a10f12ed
      Zhichao Cao 提交于
      Summary:
      In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765
      
      Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check
      
      Reviewed By: anand1976
      
      Differential Revision: D21916789
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b
      a10f12ed
  33. 11 7月, 2020 1 次提交
    • W
      Reduce `env_->GetChildren()` calls in DBImpl::Recover() (#7044) · 4924a506
      wenh 提交于
      Summary:
      There currently exist multiple `GetChildren()` calls in `DBImpl::Recover()`, which can be expensive in cases of distributed file systems.
      This pull request try to call `DBImpl::Recover()` of each necessary directory only _once_ and reuse the results in the places of repeated calls in current code.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7044
      
      Test Plan:
      Run `make check` and use the default test suite. The modified code should be semantically identical to the current code. As a proof of this solution, we may optionally deploy the system onto a (real or simulated) distributed system and expect reduced latency caused by manifest fetching.
      
      (WIP)
      
      Reviewed By: riversand963
      
      Differential Revision: D22419925
      
      Pulled By: roghnin
      
      fbshipit-source-id: d3774fbfbc246c5527101bc16747eb5c90919886
      4924a506
  34. 18 6月, 2020 1 次提交
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  35. 16 6月, 2020 1 次提交
    • Y
      Let best-efforts recovery ignore CURRENT file (#6970) · 9bfd46d0
      Yanqin Jin 提交于
      Summary:
      Best-efforts recovery does not check the content of CURRENT file to determine which MANIFEST to recover from. However, it still checks the presence of CURRENT file to determine whether to create a new DB during `open()`. Therefore, we can tweak the logic in `open()` a little bit so that best-efforts recovery does not rely on CURRENT file at all.
      
      Test plan (dev server):
      make check
      ./db_basic_test --gtest_filter=DBBasicTest.RecoverWithNoCurrentFile
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6970
      
      Reviewed By: anand1976
      
      Differential Revision: D22013990
      
      Pulled By: riversand963
      
      fbshipit-source-id: db552a1868c60ed70e1f7cd252a3a076eb8ea58f
      9bfd46d0
  36. 12 6月, 2020 1 次提交
    • Y
      Fail point-in-time WAL recovery upon IOError reading WAL (#6963) · 717749f4
      Yanqin Jin 提交于
      Summary:
      If `options.wal_recovery_mode == WALRecoveryMode::kPointInTimeRecovery`, RocksDB stops replaying WAL once hitting an error and discards the rest of the WAL. This can lead to data loss if the error occurs at an offset smaller than the last sync'ed offset.
      Ideally, RocksDB point-in-time recovery should permit recovery if the error occurs after last synced offset while fail recovery if error occurs before the last synced offset. However, RocksDB does not track the synced offset of WALs. Consequently, RocksDB does not know whether an error occurs before or after the last synced offset. An error can be one of the following.
      - WAL record checksum mismatch. This can result from both corruption of synced data and dropping of unsynced data during shutdown. We cannot be sure which one. In order not to defeat the original motivation to permit the latter case, we keep the original behavior of point-in-time WAL recovery.
      - IOError. This means the WAL can be bad, an indicator of whole file becoming unavailable, not to mention synced part of the WAL. Therefore, we choose to modify the behavior of point-in-time recovery and fail the database recovery.
      
      Test plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6963
      
      Reviewed By: ajkr
      
      Differential Revision: D22011083
      
      Pulled By: riversand963
      
      fbshipit-source-id: f9cbf29a37dc5cc40d3fa62f89eed1ad67ca1536
      717749f4
  37. 06 6月, 2020 1 次提交
  38. 09 5月, 2020 1 次提交
    • Y
      Fix a few bugs in best-efforts recovery (#6824) · e72e2167
      Yanqin Jin 提交于
      Summary:
      1. Update column_family_memtables_ to point to latest column_family_set in
         version_set after recovery.
      2. Normalize file paths passed by application so that directories end with '/'
         or '\\'.
      3. In addition to missing files, corrupted files are also ignored in
         best-efforts recovery.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6824
      
      Test Plan: COMPILE_WITH_ASAN=1 make check
      
      Reviewed By: anand1976
      
      Differential Revision: D21463905
      
      Pulled By: riversand963
      
      fbshipit-source-id: c48db8843cc93c8c1c7139c474b64e6f775307d2
      e72e2167