1. 03 3月, 2021 1 次提交
    • L
      Break down the amount of data written during flushes/compactions per file type (#8013) · a46f080c
      Levi Tamasi 提交于
      Summary:
      The patch breaks down the "bytes written" (as well as the "number of output files")
      compaction statistics into two, so the values are logged separately for table files
      and blob files in the info log, and are shown in separate columns (`Write(GB)` for table
      files, `Wblob(GB)` for blob files) when the compaction statistics are dumped.
      This will also come in handy for fixing the write amplification statistics, which currently
      do not consider the amount of data read from blob files during compaction. (This will
      be fixed by an upcoming patch.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8013
      
      Test Plan: Ran `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D26742156
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 31d18ee8f90438b438ca7ed1ea8cbd92114442d5
      a46f080c
  2. 02 3月, 2021 1 次提交
  3. 26 2月, 2021 2 次提交
    • Y
      Remove unused/incorrect fwd declaration (#8002) · c370d8aa
      Yanqin Jin 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8002
      
      Reviewed By: anand1976
      
      Differential Revision: D26659354
      
      Pulled By: riversand963
      
      fbshipit-source-id: 6b464dbea9fd8240ead8cc5af393f0b78e8f9dd1
      c370d8aa
    • Y
      Compaction filter support for (new) BlobDB (#7974) · cef4a6c4
      Yanqin Jin 提交于
      Summary:
      Allow applications to implement a custom compaction filter and pass it to BlobDB.
      
      The compaction filter's custom logic can operate on blobs.
      To do so, application needs to subclass `CompactionFilter` abstract class and implement `FilterV2()` method.
      Optionally, a method called `ShouldFilterBlobByKey()` can be implemented if application's custom logic rely solely
      on the key to make a decision without reading the blob, thus saving extra IO. Examples can be found in
      db/blob/db_blob_compaction_test.cc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7974
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D26509280
      
      Pulled By: riversand963
      
      fbshipit-source-id: 59f9ae5614c4359de32f4f2b16684193cc537b39
      cef4a6c4
  4. 24 2月, 2021 1 次提交
    • S
      Fix testcase failures on windows (#7992) · e017af15
      sherriiiliu 提交于
      Summary:
      Fixed 5 test case failures found on Windows 10/Windows Server 2016
      1. In `flush_job_test`, the DestroyDir function fails in deconstructor because some file handles are still being held by VersionSet. This happens on Windows Server 2016, so need to manually reset versions_ pointer to release all file handles.
      2. In `StatsHistoryTest.InMemoryStatsHistoryPurging` test, the capping memory cost of stats_history_size on Windows becomes 14000 bytes with latest changes, not just 13000 bytes.
      3. In `SSTDumpToolTest.RawOutput` test, the output file handle is not closed at the end.
      4. In `FullBloomTest.OptimizeForMemory` test, ROCKSDB_MALLOC_USABLE_SIZE is undefined on windows so `total_mem` is always equal to `total_size`. The internal memory fragmentation assertion does not apply in this case.
      5. In `BlockFetcherTest.FetchAndUncompressCompressedDataBlock` test, XPRESS cannot reach 87.5% compression ratio with original CreateTable method, so I append extra zeros to the string value to enhance compression ratio. Beside, since XPRESS allocates memory internally, thus does not support for custom allocator verification, we will skip the allocator verification for XPRESS
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7992
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26615283
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3632612f84b99e2b9c77c403b112b6bedf3b125d
      e017af15
  5. 23 2月, 2021 1 次提交
  6. 20 2月, 2021 2 次提交
    • A
      Limit buffering for collecting samples for compression dictionary (#7970) · d904233d
      Andrew Kryczka 提交于
      Summary:
      For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
      
      However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
      
      Related changes include:
      
      - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
      - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
      - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
      
      Test Plan:
      - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
      - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
      
      Reviewed By: pdillinger
      
      Differential Revision: D26467994
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
      d904233d
    • M
      Fix handling of Mutable options; Allow DB::SetOptions to update mutable... · 4bc9df94
      mrambacher 提交于
      Fix handling of Mutable options; Allow DB::SetOptions to update mutable TableFactory Options (#7936)
      
      Summary:
      Added a "only_mutable_options" flag to the ConfigOptions.  When set, the Configurable methods will only look at/update options that are marked as kMutable.
      
      Fixed DB::SetOptions to allow for the update of any mutable TableFactory options.  Fixes https://github.com/facebook/rocksdb/issues/7385.
      
      Added tests for the new flag.  Updated HISTORY.md
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7936
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D26389646
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6dc247f6e999fa2814059ebbd0af8face109fea0
      4bc9df94
  7. 19 2月, 2021 2 次提交
    • Z
      Introduce a new trace file format (v 0.2) for better extension (#7977) · b0fd1cc4
      Zhichao Cao 提交于
      Summary:
      The trace file record and payload encode is fixed, which requires complex backward compatibility resolving. This PR introduce a new trace file format, which makes it easier to add new entries to the payload and does not have backward compatible issues. V 0.1 is still supported in this PR. Added the tracing for lower_bound and upper_bound for iterator.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7977
      
      Test Plan: make check. tested with old trace file in replay and analyzing.
      
      Reviewed By: anand1976
      
      Differential Revision: D26529948
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: ebb75a127ce3c07c25a1ccc194c551f917896a76
      b0fd1cc4
    • J
      Fix txn `MultiGet()` return un-committed data with snapshot (#7963) · 59ba104e
      Jay Zhuang 提交于
      Summary:
      TransactionDB uses read callback to filter out un-committed data before
      a snapshot. But `MultiGet()` API doesn't use that at all, which causes
      returning unwanted data.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7963
      
      Test Plan: Added unittest to reproduce
      
      Reviewed By: anand1976
      
      Differential Revision: D26455851
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 265276698cf9d8c4cd79e3250ef10d14375bac55
      59ba104e
  8. 18 2月, 2021 1 次提交
  9. 17 2月, 2021 1 次提交
  10. 16 2月, 2021 1 次提交
  11. 11 2月, 2021 1 次提交
    • Z
      Handoff checksum Implementation (#7523) · d1c510ba
      Zhichao Cao 提交于
      Summary:
      in PR https://github.com/facebook/rocksdb/issues/7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7523
      
      Test Plan: add new unit test, pass make check/ make asan_check
      
      Reviewed By: pdillinger
      
      Differential Revision: D24313271
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
      d1c510ba
  12. 09 2月, 2021 1 次提交
  13. 07 2月, 2021 1 次提交
  14. 06 2月, 2021 1 次提交
  15. 05 2月, 2021 1 次提交
  16. 30 1月, 2021 2 次提交
    • L
      Fix a SingleDelete related optimization for blob indexes (#7904) · e5311a8e
      Levi Tamasi 提交于
      Summary:
      There is a small `SingleDelete` related optimization in the
      `CompactionIterator` code: when a `SingleDelete`-`Put` pair is preserved
      solely for the purposes of transaction conflict checking, the value
      itself gets cleared. (This is referred to as "optimization 3" in the
      `CompactionIterator` code.) Though the rest of the code got updated to
      support `SingleDelete`'ing blob indexes, this chunk was apparently
      missed, resulting in an assertion failure (or `ROCKS_LOG_FATAL` in release
      builds) when triggered. Note: in addition to clearing the value, we also
      need to update the type of the KV to regular value when dealing with
      blob indexes here.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7904
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D26118009
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 6bf78043d20265e2b15c2e1ab8865025040c42ae
      e5311a8e
    • A
      Integrity protection for live updates to WriteBatch (#7748) · 78ee8564
      Andrew Kryczka 提交于
      Summary:
      This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
      
      The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
      
      When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
      
      Test Plan:
      - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
      - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
      - [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
      
      Reviewed By: pdillinger
      
      Differential Revision: D25754492
      
      Pulled By: ajkr
      
      fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
      78ee8564
  17. 29 1月, 2021 2 次提交
    • M
      Remove Legacy and Custom FileWrapper classes from header files (#7851) · 4a09d632
      mrambacher 提交于
      Summary:
      Removed the uses of the Legacy FileWrapper classes from the source code.  The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.
      
      Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851
      
      Reviewed By: anand1976
      
      Differential Revision: D26114816
      
      Pulled By: mrambacher
      
      fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
      4a09d632
    • M
      Make builds reproducible (#7866) · 0a9a05ae
      mrambacher 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/7035
      
      Changed how build_version.cc was generated:
      - Included the GIT tag/branch in the build_version file
      - Changed the "Build Date" to be:
            - If the GIT branch is "clean" (no changes), the date of the last git commit
            - If the branch is not clean, the current date
       - Added APIs to access the "build information", rather than accessing the strings directly.
      
      The build_version.cc file is now regenerated whenever the library objects are rebuilt.
      
      Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7866
      
      Reviewed By: pdillinger
      
      Differential Revision: D26086565
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f
      0a9a05ae
  18. 28 1月, 2021 2 次提交
    • L
      Accumulate blob file additions in VersionEdit during recovery (#7903) · c696f274
      Levi Tamasi 提交于
      Summary:
      During recovery, RocksDB performs a kind of dummy flush; namely, entries
      from the WAL are added to memtables, which then get written to SSTs and
      blob files (if enabled) just like during a regular flush. Note that
      multiple memtables might be flushed during recovery for the same column
      family, for example, if the DB is reopened with a lower write buffer size,
      and therefore, we need to make sure to collect all SST and blob file
      additions. The patch fixes a bug in the earlier logic which resulted in
      later blob file additions overwriting earlier ones.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7903
      
      Test Plan: Added a unit test and ran `db_stress`.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26110847
      
      Pulled By: ltamasi
      
      fbshipit-source-id: eddb50a608a88f54f3cec3a423de8235aba951fd
      c696f274
    • Z
      Do not set bg error for compaction in retryable IO Error case (#7899) · 95013df2
      Zhichao Cao 提交于
      Summary:
      When retryable IO error occurs during compaction, it is mapped to soft error and set the BG error. However, auto resume is not called to clean the soft error since compaction will reschedule by itself. In this change, When retryable IO error occurs during compaction, BG error is not set. User will be informed the error via EventHelper.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7899
      
      Test Plan: tested with error_handler_fs_test
      
      Reviewed By: anand1976
      
      Differential Revision: D26094097
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: c53424f11d237405592cd762f43cbbdf8da8234f
      95013df2
  19. 27 1月, 2021 2 次提交
    • J
      Fix deadlock in `fs_test.WALWriteRetryableErrorAutoRecover1` (#7897) · c6ff4c0b
      Jay Zhuang 提交于
      Summary:
      The recovery thread could hold the db.mutex, which is needed from sync
      write in main thread.
      Make sure the write is done before recovery thread starts.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7897
      
      Test Plan: `gtest-parallel ./error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.WALWriteRetryableErrorAutoRecover1 -r 10000 --workers=200`
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D26082933
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 226fc49228c0e5903f86ff45cc3fed3080abdb1f
      c6ff4c0b
    • J
      Fix flaky `error_handler_fs_test.MultiDBCompactionError` (#7896) · 9425acac
      Jay Zhuang 提交于
      Summary:
      The error recovery thread may out-live DBImpl object, which causing
      access released DBImpl.mutex. Close SstFileManager before closing DB.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7896
      
      Test Plan:
      the issue can be reproduced by adding sleep in recovery code.
      Pass the tests with sleep.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D26076655
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 0d9cc5639c12fcfc001427015e75a9736f33cd96
      9425acac
  20. 26 1月, 2021 3 次提交
    • M
      Add a SystemClock class to capture the time functions of an Env (#7858) · 12f11373
      mrambacher 提交于
      Summary:
      Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
      
      Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
      
      There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
      
      Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
      
      Reviewed By: pdillinger
      
      Differential Revision: D26006406
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
      12f11373
    • A
      In IOTracing, add filename with each operation in trace file. (#7885) · 1d226018
      Akanksha Mahajan 提交于
      Summary:
      1. In IOTracing, add filename with each IOTrace record. Filename is stored in file object (Tracing Wrappers).
               2. Change the logic of figuring out which additional information (file_size,
                  length, offset etc) needs to be store with each operation
                  which is different for different operations.
                  When new information will be added in future (depends on operation),
                  this change would make the future additions simple.
      
      Logic: In IOTraceRecord, io_op_data is added and its
               bitwise positions represent which additional information need
               to added in the record from enum IOTraceOp. Values in IOTraceOp represent bitwise positions.
               So if length and offset needs to be stored (IOTraceOp::kIOLen
               is 1 and IOTraceOp::kIOOffset is 2), position 1 and 2 (from rightmost bit) will be set
               and io_op_data will contain 110.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7885
      
      Test Plan: Updated io_tracer_test and verified the trace file manually.
      
      Reviewed By: anand1976
      
      Differential Revision: D25982353
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: ebfc5539cc0e231d7794a6b42b73f5403e360b22
      1d226018
    • L
      Do not explicitly flush blob files when using the integrated BlobDB (#7892) · 431e8afb
      Levi Tamasi 提交于
      Summary:
      In the original stacked BlobDB implementation, which writes blobs to blob files
      immediately and treats blob files as logs, it makes sense to flush the file after
      writing each blob to protect against process crashes; however, in the integrated
      implementation, which builds blob files in the background jobs, this unnecessarily
      reduces performance. This patch fixes this by simply adding a `do_flush` flag to
      `BlobLogWriter`, which is set to `true` by the stacked implementation and to `false`
      by the new code. Note: the change itself is trivial but the tests needed some work;
      since in the new implementation, blobs are now buffered, adding a blob to
      `BlobFileBuilder` is no longer guaranteed to result in an actual I/O. Therefore, we can
      no longer rely on `FaultInjectionTestEnv` when testing failure cases; instead, we
      manipulate the return values of I/O methods directly using `SyncPoint`s.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7892
      
      Test Plan: `make check`
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D26022814
      
      Pulled By: ltamasi
      
      fbshipit-source-id: b3dce419f312137fa70d84cdd9b908fd5d60d8cd
      431e8afb
  21. 22 1月, 2021 2 次提交
    • M
      MergeHelper::FilterMerge() calling ElapsedNanosSafe() upon exit even … (#7867) · 12a8be1d
      Matthew Von-Maszewski 提交于
      Summary:
      …when unused.  Causes many calls to clock_gettime, impacting performance.
      
      Was looking for something else via Linux "perf" command when I spotted heavy usage of clock_gettime during a compaction.  Our product heavily uses the rocksdb::Options::merge_operator.  MergeHelper::FilterMerge() properly tests if timing is enabled/disabled upon entry, but not on exit.  This patch fixes the exit.
      
      Note:  the entry test also verifies if "nullptr!=stats_".  This test is redundant to code within ShouldReportDetailedTime().  Therefore I omitted it in my change.
      
      merge_test.cc updated with test that shows failure before merge_helper.cc change ... and fix after change.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7867
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D25960175
      
      Pulled By: ajkr
      
      fbshipit-source-id: 56e66d7eb6ae5eae89c8e0d5a262bd2905a226b6
      12a8be1d
    • A
      workaround race conditions during `PeriodicWorkScheduler` registration (#7888) · e18a4df6
      Andrew Kryczka 提交于
      Summary:
      This provides a workaround for two race conditions that will be fixed in
      a more sophisticated way later. This PR:
      
      (1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see https://github.com/facebook/rocksdb/issues/7711). The long-term fix will be to make those functions thread-safe.
      (2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7888
      
      Test Plan: ran the repro provided in https://github.com/facebook/rocksdb/issues/7881
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D25990891
      
      Pulled By: ajkr
      
      fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38
      e18a4df6
  22. 21 1月, 2021 1 次提交
    • L
      Make blob related VersionEdit tags unignorable (#7886) · 2d37830e
      Levi Tamasi 提交于
      Summary:
      BlobFileAddition and BlobFileGarbage should not be in the ignorable tag
      range, since if they are present in the MANIFEST, users cannot downgrade
      to a RocksDB version that does not understand them without losing access
      to the data in the blob files. The patch moves these two tags to the
      unignorable range; this should still be safe at this point, since the
      integrated BlobDB project is still work in progress and thus there
      shouldn't be any ignorable BlobFileAddition/BlobFileGarbage tags out
      there.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7886
      
      Test Plan: `make check`
      
      Reviewed By: cheng-chang
      
      Differential Revision: D25980956
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 13cf5bd61d77f049b513ecd5ad0be8c637e40a9d
      2d37830e
  23. 20 1月, 2021 2 次提交
    • C
      Make it able to ignore WAL related VersionEdits in older versions (#7873) · e4494829
      Cheng Chang 提交于
      Summary:
      Although the tags for `WalAddition`, `WalDeletion` are after `kTagSafeIgnoreMask`, to actually be able to skip these entries in older versions of RocksDB, we require that they are encoded with their encoded size as the prefix. This requirement is not met in the current codebase, so a downgraded DB may fail to open if these entries exist in the MANIFEST.
      
      If a DB wants to downgrade, and its MANIFEST contains `WalAddition` or `WalDeletion`, it can set `track_and_verify_wals_in_manifest` to `false`, then restart twice, then downgrade. On the first restart, a new MANIFEST will be created with a `WalDeletion` indicating that all previously tracked WALs are removed from MANIFEST. On the second restart, since there is  no tracked WALs in MANIFEST now, a new MANIFEST will be created with neither `WalAddition` nor `WalDeletion`. Then the DB can downgrade.
      
      Tags for `BlobFileAddition`, `BlobFileGarbage` also have the same problem, but this PR focuses on solving the problem for WAL edits.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7873
      
      Test Plan: Added a `VersionEditTest::IgnorableTags` unit test to verify all entries with tags larger than `kTagSafeIgnoreMask` can actually be skipped and won't affect parsing of other entries.
      
      Reviewed By: ajkr
      
      Differential Revision: D25935930
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: 7a02fdba4311d6084328c14aed110a26d08c3efb
      e4494829
    • V
      Fix write-ahead log file size overflow (#7870) · 4db58bcf
      Vladimir Maksimovski 提交于
      Summary:
      The WAL's file size is stored as an unsigned 64 bit integer.
      
      In db_info_dumper.cc, this integer gets converted to a string. Since 2^64 is approximately 10^19, we need 20 digits to represent the integer correctly. To store the decimal representation, we need 21 bytes (+1 due to the '\0' terminator at the end). The code previously used 16 bytes, which would overflow if the log is really big (>1 petabyte).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7870
      
      Reviewed By: ajkr
      
      Differential Revision: D25938776
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 6ee9e21ebd65d297ea90fa1e7e74f3e1c533299d
      4db58bcf
  24. 16 1月, 2021 1 次提交
  25. 12 1月, 2021 2 次提交
  26. 10 1月, 2021 1 次提交
    • A
      Improvements to Env::GetChildren (#7819) · 4926b337
      Adam Retter 提交于
      Summary:
      The main improvement here is to not include `.` or `..` in the results of `Env::GetChildren`. The occurrence of `.` or `..`; it is non-portable, dependent on the Operating System and the File System. See: https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html
      
      There were lots of duplicate checks spread through the RocksDB codebase previously to skip `.` and `..`. This new removes the need for those at the source.
      
      Also some minor fixes to `Env::GetChildren`:
      * Improve error handling in POSIX implementation
      * Remove unnecessary array allocation on Windows
      * Fix struct name for Windows Non-UTF-8 API
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7819
      
      Reviewed By: ajkr
      
      Differential Revision: D25837394
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 1e137e7218d38b450af9c083f73d5357abcbba2e
      4926b337
  27. 08 1月, 2021 2 次提交
    • Z
      Treat File Scope Write IO Error the same as Retryable IO Error (#7840) · 48c0843e
      Zhichao Cao 提交于
      Summary:
      In RocksDB, when IO error happens, the flags of IOStatus can be set. If the IOStatus is set as "File Scope IO Error", it indicate that the error is constrained in the file level. Since RocksDB does not continues write data to a file when any IO Error happens, File Scope IO Error can be treated the same as Retryable IO Error. Adding the logic to ErrorHandler::SetBGError to include the file scope IO Error in its error handling logic, which is the same as retryable IO Error.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7840
      
      Test Plan: added new unit tests in error_handler_fs_test. make check
      
      Reviewed By: anand1976
      
      Differential Revision: D25820481
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 69cabd3d010073e064d6142ce1cabf341b8a6806
      48c0843e
    • M
      Add more tests to the ASC pass list (#7834) · cc2a180d
      mrambacher 提交于
      Summary:
      Fixed the following  to now pass ASC checks:
      * `ttl_test`
      * `blob_db_test`
      * `backupable_db_test`,
      * `delete_scheduler_test`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7834
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D25795398
      
      Pulled By: ajkr
      
      fbshipit-source-id: a10037817deda4fc7cbb353a2e00b62ed89b6476
      cc2a180d