1. 08 10月, 2021 1 次提交
    • Z
      Add file temperature related counter and bytes stats to and io_stats (#8710) · b632ed0c
      Zhichao Cao 提交于
      Summary:
      For tiered storage project, we need to know the block read count and read bytes of files with different temperature. Add FileIOByTemperature to IOStatsContext and collect the bytes read and read count from different temperature files through the RandomAccessFileReader.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8710
      
      Test Plan: make check, add the testing cases
      
      Reviewed By: siying
      
      Differential Revision: D30582400
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: d83173de594374fc8404af5ce93a6a9be72c7141
      b632ed0c
  2. 02 10月, 2021 1 次提交
    • Y
      Add additional checks for three existing unit tests (#8973) · 2cdaf5ca
      Yanqin Jin 提交于
      Summary:
      With test sync points, we can assert on the equality of iterator value in three existing
      unit tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8973
      
      Test Plan:
      ```
      gtest-parallel -r 1000 ./db_test2 --gtest_filter=DBTest2.IterRaceFlush2:DBTest2.IterRaceFlush1:DBTest2.IterRefreshRaceFlush
      ```
      
      make check
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D31256340
      
      Pulled By: riversand963
      
      fbshipit-source-id: a9440767ab383e0ec61bd43ffa8fbec4ba562ea2
      2cdaf5ca
  3. 11 9月, 2021 1 次提交
  4. 08 9月, 2021 1 次提交
    • M
      Make MemTableRepFactory into a Customizable class (#8419) · beed8647
      mrambacher 提交于
      Summary:
      This PR does the following:
      -> Makes the MemTableRepFactory into a Customizable class and creatable/configurable via CreateFromString
      -> Makes the existing implementations compatible with configurations
      -> Moves the "SpecialRepFactory" test class into testutil, accessible via the ObjectRegistry or a NewSpecial API
      
      New tests were added to validate the functionality and all existing tests pass.  db_bench and memtablerep_bench were hand-tested to verify the functionality in those tools.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8419
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D29558961
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 81b7229636e4e649a0c914e73ac7b0f8454c931c
      beed8647
  5. 02 9月, 2021 1 次提交
    • P
      Remove some unneeded code (#8736) · c9cd5d25
      Peter Dillinger 提交于
      Summary:
      * FullKey and ParseFullKey appear to serve no purpose in the public API
      (or anything else) so removed. Only use in one test updated.
      * NumberToString serves no purpose vs. ToString so removed, numerous
      calls updated
      * Remove unnecessary forward declarations in metadata.h by re-arranging
      class definitions.
      * Remove some unneeded semicolons
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736
      
      Test Plan: existing tests
      
      Reviewed By: mrambacher
      
      Differential Revision: D30700039
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2
      c9cd5d25
  6. 21 8月, 2021 1 次提交
  7. 20 8月, 2021 1 次提交
    • M
      Add iterator's lower and upper bounds to `TraceRecord` (#8677) · ff895338
      Merlin Mao 提交于
      Summary:
      Trace file V2 added lower/upper bounds to `Iterator::Seek()` and `Iterator::SeekForPrev()`. They were not used anywhere during the execution of a `TraceRecord`. Now they are added to be used by `ReadOptions` during `Iterator::Seek()` and `Iterator::SeekForPrev()` if they are set.
      
      Added test cases in `DBTest2.TraceAndManualReplay`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8677
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D30438255
      
      Pulled By: autopear
      
      fbshipit-source-id: 82563006be0b69155990e506a74951c18af8d288
      ff895338
  8. 19 8月, 2021 1 次提交
    • M
      Allow Replayer to report the results of TraceRecords. (#8657) · d10801e9
      Merlin Mao 提交于
      Summary:
      `Replayer::Execute()` can directly returns the result (e.g, request latency, DB::Get() return code, returned value, etc.)
      `Replayer::Replay()` reports the results via a callback function.
      
      New interface:
      `TraceRecordResult` in "rocksdb/trace_record_result.h".
      
      `DBTest2.TraceAndReplay` and `DBTest2.TraceAndManualReplay` are updated accordingly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8657
      
      Reviewed By: ajkr
      
      Differential Revision: D30290216
      
      Pulled By: autopear
      
      fbshipit-source-id: 3c8d4e6b180ec743de1a9d9dcaee86064c74f0d6
      d10801e9
  9. 16 8月, 2021 2 次提交
  10. 12 8月, 2021 1 次提交
    • M
      Make TraceRecord and Replayer public (#8611) · f58d2767
      Merlin Mao 提交于
      Summary:
      New public interfaces:
      `TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h".
      `Replayer`, available in `rocksdb/utilities/replayer.h`.
      
      User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file.
      
      Unit tests:
      - `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes.
      - `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611
      
      Reviewed By: ajkr
      
      Differential Revision: D30266329
      
      Pulled By: autopear
      
      fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8
      f58d2767
  11. 10 8月, 2021 1 次提交
  12. 31 7月, 2021 1 次提交
    • M
      Allow WAL dir to change with db dir (#8582) · ab7f7c9e
      mrambacher 提交于
      Summary:
      Prior to this change, the "wal_dir"  DBOption would always be set (defaults to dbname) when the DBOptions were sanitized.  Because of this setitng in the options file, it was not possible to rename/relocate a database directory after it had been created and use the existing options file.
      
      After this change, the "wal_dir" option is only set under specific circumstances.  Methods were added to the ImmutableDBOptions class to see if it is set and if it is set to something other than the dbname.  Additionally, a method was added to retrieve the effective value of the WAL dir (either the option or the dbname/path).
      
      Tests were added to the core and ldb to test that a database could be created and renamed without issue.  Additional tests for various permutations of wal_dir were also added.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8582
      
      Reviewed By: pdillinger, autopear
      
      Differential Revision: D29881122
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 67d3d033dc8813d59917b0a3fba2550c0efd6dfb
      ab7f7c9e
  13. 23 7月, 2021 1 次提交
  14. 02 7月, 2021 1 次提交
  15. 08 6月, 2021 1 次提交
    • D
      Cancel compact range (#8351) · 80a59a03
      David Devecsery 提交于
      Summary:
      Added the ability to cancel an in-progress range compaction by storing to an atomic "canceled" variable pointed to within the CompactRangeOptions structure.
      
      Tested via two tests added to db_tests2.cc.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8351
      
      Reviewed By: ajkr
      
      Differential Revision: D28808894
      
      Pulled By: ddevec
      
      fbshipit-source-id: cb321361c9e23b084b188bb203f11c375a22c2dd
      80a59a03
  16. 21 5月, 2021 1 次提交
    • S
      Compare memtable insert and flush count (#8288) · 2f1984dd
      sdong 提交于
      Summary:
      When a memtable is flushed, it will validate number of entries it reads, and compare the number with how many entries inserted into memtable. This serves as one sanity c\
      heck against memory corruption. This change will also allow more counters to be added in the future for better validation.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8288
      
      Test Plan: Pass all existing tests
      
      Reviewed By: ajkr
      
      Differential Revision: D28369194
      
      fbshipit-source-id: 7ff870380c41eab7f99eee508550dcdce32838ad
      2f1984dd
  17. 18 5月, 2021 1 次提交
    • S
      Write file temperature information to manifest (#8284) · 0ed8cb66
      sdong 提交于
      Summary:
      As a part of tiered storage, writing tempeature information to manifest is needed so that after DB recovery, RocksDB still has the tiering information, to implement some further necessary functionalities.
      
      Also fix some issues in simulated hybrid FS.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8284
      
      Test Plan: Add a new unit test to validate that the information is indeed written and read back.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28335801
      
      fbshipit-source-id: 56aeb2e6ea090be0200181dd968c8a7278037def
      0ed8cb66
  18. 23 4月, 2021 1 次提交
    • Z
      Fix the false positive alert of CF consistency check in WAL recovery (#8207) · 09a9ec3a
      Zhichao Cao 提交于
      Summary:
      In current RocksDB, in recover the information form WAL, we do the consistency check for each column family when one WAL file is corrupted and PointInTimeRecovery is set. However, it will report a false positive alert on "SST file is ahead of WALs" when one of the CF current log number is greater than the corrupted WAL number (CF contains the data beyond the corrupted WAl) due to a new column family creation during flush. In this case, a new WAL is created (it is empty) during a flush. Also, due to some reason (e.g., storage issue or crash happens before SyncCloseLog is called), the old WAL is corrupted. The new CF has no data, therefore, it does not have the consistency issue.
      
      Fix: when checking cfd->GetLogNumber() > corrupted_wal_number also check cfd->GetLiveSstFilesSize() > 0. So the CFs with no SST file data will skip the check here.
      
      Note potential ignored inconsistency caused due to fix: empty CF can also be caused by write+delete. In this case, after flush, there is no SST files being generated. However, this CF still have the log in the WAL. When the WAL is corrupted, the DB might be inconsistent.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8207
      
      Test Plan: added unit test, make crash_test
      
      Reviewed By: riversand963
      
      Differential Revision: D27898839
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 931fc2d8b92dd00b4169bf84b94e712fd688a83e
      09a9ec3a
  19. 20 4月, 2021 1 次提交
    • Y
      Handle rename() failure in non-local FS (#8192) · a376c220
      Yanqin Jin 提交于
      Summary:
      In a distributed environment, a file `rename()` operation can succeed on server (remote)
      side, but the client can somehow return non-ok status to RocksDB. Possible reasons include
      network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which
      can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a
      new MANIFEST. We currently always delete the new MANIFEST if an error occurs.
      
      This is problematic in distributed world. If the server-side successfully updates the CURRENT
      file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail.
      
      As a fix, we can track the execution result of IO operations on the new MANIFEST.
      - If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original
        MANIFEST. Therefore, it is safe to remove the new MANIFEST.
      - If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up
        code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local
        POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the
        new MANIFEST.) Therefore, we keep the new MANIFEST.
          - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT.
          - If process reopens the db immediately after the failure, then the CURRENT file can point
            to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can
            succeed and ignore the other.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192
      
      Test Plan: make check
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D27804648
      
      Pulled By: riversand963
      
      fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4
      a376c220
  20. 08 4月, 2021 1 次提交
    • G
      Fix flush reason attribution (#8150) · 48cd7a3a
      Giuseppe Ottaviano 提交于
      Summary:
      Current flush reason attribution is misleading or incorrect (depending on what the original intention was):
      
      - Flush due to WAL reaching its maximum size is attributed to `kWriteBufferManager`
      - Flushes due to full write buffer and write buffer manager are not distinguishable, both are attributed to `kWriteBufferFull`
      
      This changes the first to a new flush reason `kWALFull`, and splits the second between `kWriteBufferManager` and `kWriteBufferFull`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8150
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D27569645
      
      Pulled By: ot
      
      fbshipit-source-id: 7e3c8ca186a6e71976e6b8e937297eebd4b769cc
      48cd7a3a
  21. 03 4月, 2021 1 次提交
  22. 09 3月, 2021 1 次提交
  23. 20 2月, 2021 1 次提交
    • A
      Limit buffering for collecting samples for compression dictionary (#7970) · d904233d
      Andrew Kryczka 提交于
      Summary:
      For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
      
      However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
      
      Related changes include:
      
      - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
      - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
      - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
      
      Test Plan:
      - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
      - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
      
      Reviewed By: pdillinger
      
      Differential Revision: D26467994
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
      d904233d
  24. 07 2月, 2021 1 次提交
  25. 06 2月, 2021 1 次提交
  26. 10 1月, 2021 1 次提交
    • A
      Improvements to Env::GetChildren (#7819) · 4926b337
      Adam Retter 提交于
      Summary:
      The main improvement here is to not include `.` or `..` in the results of `Env::GetChildren`. The occurrence of `.` or `..`; it is non-portable, dependent on the Operating System and the File System. See: https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html
      
      There were lots of duplicate checks spread through the RocksDB codebase previously to skip `.` and `..`. This new removes the need for those at the source.
      
      Also some minor fixes to `Env::GetChildren`:
      * Improve error handling in POSIX implementation
      * Remove unnecessary array allocation on Windows
      * Fix struct name for Windows Non-UTF-8 API
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7819
      
      Reviewed By: ajkr
      
      Differential Revision: D25837394
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 1e137e7218d38b450af9c083f73d5357abcbba2e
      4926b337
  27. 07 1月, 2021 1 次提交
    • M
      Create a CustomEnv class; Add WinFileSystem; Make LegacyFileSystemWrapper private (#7703) · e628f59e
      mrambacher 提交于
      Summary:
      This PR does the following:
      -> Creates a WinFileSystem class.  This class is the Windows equivalent of the PosixFileSystem and will be used on Windows systems.
      -> Introduces a CustomEnv class.  A CustomEnv is an Env that takes a FileSystem as constructor argument.  I believe there will only ever be two implementations of this class (PosixEnv and WinEnv).  There is still a CustomEnvWrapper class that takes an Env and a FileSystem and wraps the Env calls with the input Env but uses the FileSystem for the FileSystem calls
      -> Eliminates the public uses of the LegacyFileSystemWrapper.
      
      With this change in place, there are effectively the following patterns of Env:
      - "Base Env classes" (PosixEnv, WinEnv).  These classes implement the core Env functions (e.g. Threads) and have a hard-coded input FileSystem.  These classes inherit from CompositeEnv, implement the core Env functions (threads) and delegate the FileSystem-like calls to the input file system.
      - Wrapped Composite Env classes (MemEnv).  These classes take in an Env and a FileSystem.  The core env functions are re-directed to the wrapped env.  The file system calls are redirected to the input file system
      - Legacy Wrapped Env classes.  These classes take in an Env input (but no FileSystem).  The core env functions are re-directed to the wrapped env.  A "Legacy File System" is created using this env and the file system calls directed to the env itself.
      
      With these changes in place, the PosixEnv becomes a singleton -- there is only ever one created.  Any other use of the PosixEnv is via another wrapped env.  This cleans up some of the issues with the env construction and destruction.
      
      Additionally, there were places in the code that required had an Env when they required a FileSystem.  Many of these places would wrap the Env with a LegacyFileSystemWrapper instead of using the env->GetFileSystem().  These places were changed, thereby removing layers of additional redirection (LegacyFileSystem --> Env --> Env::FileSystem).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7703
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D25762190
      
      Pulled By: anand1976
      
      fbshipit-source-id: 1a088e97fc916f28ac69c149cd1dcad0ab31704b
      e628f59e
  28. 08 12月, 2020 1 次提交
  29. 03 11月, 2020 2 次提交
    • Y
      Avoid skipping a test in db_test2 (#7629) · c992eb11
      Yanqin Jin 提交于
      Summary:
      Test report shows that this test has been skipped recently due to
      a condition that will never meet. `env_` is not equal to
      `Env::Default()` for DBTest2 that inherits from DBTestBase.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7629
      
      Test Plan:
      make check
      ./db_test2 --gtest_filter=DBTest2.PinnableSliceAndMmapReads
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D24693317
      
      Pulled By: riversand963
      
      fbshipit-source-id: b1bbd5c1e05a6fa57c1de0d74462b69e3c2d5215
      c992eb11
    • A
      Expand effect of dictionary settings in `ColumnFamilyOptions::compression_opts` (#7619) · 1adbceb5
      Andrew Kryczka 提交于
      Summary:
      In dictionary compression's initial implementation, in order to save CPU overhead, we only enabled it
      for bottom level under the assumption that the vast majority of data is
      stored there. At that time, there was no
      such thing as `ColumnFamilyOptions::bottommost_compression_opts`, so we just
      hardcoded disabling dictionary compression in flush and compactions to
      non-bottommost level. Now, we have users who generate all their files
      through flush and are considering using dictionary compression.
      
      To support such a use case, this PR expands the scope of `ColumnFamilyOptions::compression_opts` to
      additionally include flushed files and files generated by compaction to
      a non-bottommost level. Users can still get the old behavior by moving
      their dictionary settings to `ColumnFamilyOptions::bottommost_compression_opts`
      and explicitly enabling both that and `ColumnFamilyOptions::bottommost_compression`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7619
      
      Reviewed By: ltamasi
      
      Differential Revision: D24665610
      
      Pulled By: ajkr
      
      fbshipit-source-id: 656b90bce1033fe21c71e09af931ef5bde3e464c
      1adbceb5
  30. 28 10月, 2020 1 次提交
    • M
      Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) · f35f7f27
      mrambacher 提交于
      Summary:
      This PR does a few things:
      
      1.  The MockFileSystem class was split out from the MockEnv.  This change would theoretically allow a MockFileSystem to be used by other Environments as well (if we created a means of constructing one).  The MockFileSystem implements a FileSystem in its entirety and does not rely on any Wrapper implementation.
      
      2.  Make the RocksDB test suite work when MOCK_ENV=1 and ENCRYPTED_ENV=1 are set.  To accomplish this, a few things were needed:
      - The tests that tried to use the "wrong" environment (Env::Default() instead of env_) were updated
      - The MockFileSystem was changed to support the features it was missing or mishandled (such as recursively deleting files in a directory or supporting renaming of a directory).
      
      3.  Updated the test framework to have a ROCKSDB_GTEST_SKIP macro.  This can be used to flag tests that are skipped.  Currently, this defaults to doing nothing (marks the test as SUCCESS) but will mark the tests as SKIPPED when RocksDB is upgraded to a version of gtest that supports this (gtest-1.10).
      
      I have run a full "make check" with MEM_ENV, ENCRYPTED_ENV,  both, and neither under both MacOS and RedHat.  A few tests were disabled/skipped for the MEM/ENCRYPTED cases.  The error_handler_fs_test fails/hangs for MEM_ENV (presumably a timing problem) and I will introduce another PR/issue to track that problem.  (I will also push a change to disable those tests soon).  There is one more test in DBTest2 that also fails which I need to investigate or skip before this PR is merged.
      
      Theoretically, this PR should also allow the test suite to run against an Env loaded from the registry, though I do not have one to try it with currently.
      
      Finally, once this is accepted, it would be nice if there was a CircleCI job to run these tests on a checkin so this effort does not become stale.  I do not know how to do that, so if someone could write that job, it would be appreciated :)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7566
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D24408980
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 911b1554a4d0da06fd51feca0c090a4abdcb4a5f
      f35f7f27
  31. 23 10月, 2020 1 次提交
  32. 13 10月, 2020 1 次提交
  33. 15 9月, 2020 1 次提交
  34. 26 8月, 2020 1 次提交
    • S
      Get() to fail with underlying failures in PartitionIndexReader::CacheDependencies() (#7297) · 722814e3
      sdong 提交于
      Summary:
      Right now all I/O failures under PartitionIndexReader::CacheDependencies() is swallowed. This doesn't impact correctness but we've made a decision that any I/O error in read path now should be returned to users for awareness. Return errors in those cases instead.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7297
      
      Test Plan: Add a new unit test that ingest errors in this code path and see Get() fails. Only one I/O path is hit in PartitionIndexReader::CacheDependencies(). Several option changes are attempt but not able to got other pread paths triggered. Not sure whether other failure cases would be even possible. Would rely on continuous stress test to validate it.
      
      Reviewed By: anand1976
      
      Differential Revision: D23257950
      
      fbshipit-source-id: 859dbc92fa239996e1bb378329344d3d54168c03
      722814e3
  35. 18 8月, 2020 1 次提交
  36. 15 8月, 2020 1 次提交
    • A
      Disable manual compaction during `ReFitLevel()` (#7250) · a1aa3f83
      Andrew Kryczka 提交于
      Summary:
      Manual compaction with `CompactRangeOptions::change_levels` set could
      refit to a level targeted by another manual compaction. If
      force_consistency_checks were disabled, it could be possible for
      overlapping files to be written at that target level.
      
      This PR prevents the possibility by calling `DisableManualCompaction()`
      prior to `ReFitLevel()`. It also improves the manual compaction disabling
      mechanism to wait for pending manual compactions to complete before
      returning, and support disabling from multiple threads.
      
      Fixes https://github.com/facebook/rocksdb/issues/6432.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7250
      
      Test Plan:
      crash test command that repro'd the bug reliably:
      
      ```
      $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --simple -target_file_size_base=524288 -write_buffer_size=1048576 -clear_column_family_one_in=0 -reopen=0 -max_key=10000000 -column_families=1 -max_background_compactions=8 -compact_range_one_in=100000 -compression_type=none -compaction_style=1 -num_levels=5 -universal_min_merge_width=4 -universal_max_merge_width=8 -level0_file_num_compaction_trigger=12 -rate_limiter_bytes_per_sec=1048576000 -universal_max_size_amplification_percent=100 --duration=3600 --interval=60 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --enable_compaction_filter=0
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D23090800
      
      Pulled By: ajkr
      
      fbshipit-source-id: afcbcd51b42ce76789fdb907d8b9ada790709c13
      a1aa3f83
  37. 13 8月, 2020 1 次提交
    • L
      Clean up CompressBlock/CompressBlockInternal a bit (#7249) · 9d6f48ec
      Levi Tamasi 提交于
      Summary:
      The patch cleans up and refactors `CompressBlock` and `CompressBlockInternal` a bit.
      In particular, it does the following:
      * It renames `CompressBlockInternal` to `CompressData` and moves it to `util/compression.h`,
      where other general compression-related utilities are located. This will facilitate reuse in the
      BlobDB write path.
      * The signature of the method is changed so it now takes `compression_format_version`
      (similarly to the compression library specific methods) instead of `format_version` (which is
      specific to the block based table).
      * `GetCompressionFormatForVersion` no longer takes `compression_type` as a parameter.
      This parameter was only used in a (not entirely up-to-date) assertion; also, removing it
      eliminates the need to ensure this precondition holds at all call sites.
      * Does some minor cleanup in `CompressBlock`, for instance, it is now possible to pass
      only one of `sampled_output_fast` and `sampled_output_slow`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7249
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23087278
      
      Pulled By: ltamasi
      
      fbshipit-source-id: e6316e45baed8b4e7de7c1780c90501c2a3439b3
      9d6f48ec
  38. 12 8月, 2020 1 次提交
    • P
      Fix+clean up handling of mock sleeps (#7101) · 6ac1d25f
      Peter Dillinger 提交于
      Summary:
      We have a number of tests hanging on MacOS and windows due to
      mishandling of code for mock sleeps. In addition, the code was in
      terrible shape because the same variable (addon_time_) would sometimes
      refer to microseconds and sometimes to seconds. One test even assumed it
      was nanoseconds but was written to pass anyway.
      
      This has been cleaned up so that DB tests generally use a SpecialEnv
      function to mock sleep, for either some number of microseconds or seconds
      depending on the function called. But to call one of these, the test must first
      call SetMockSleep (precondition enforced with assertion), which also turns
      sleeps in RocksDB into mock sleeps. To also removes accounting for actual
      clock time, call SetTimeElapseOnlySleepOnReopen, which implies
      SetMockSleep (on DB re-open). This latter setting only works by applying
      on DB re-open, otherwise havoc can ensue if Env goes back in time with
      DB open.
      
      More specifics:
      
      Removed some unused test classes, and updated comments on the general
      problem.
      
      Fixed DBSSTTest.GetTotalSstFilesSize using a sync point callback instead
      of mock time. For this we have the only modification to production code,
      inserting a sync point callback in flush_job.cc, which is not a change to
      production behavior.
      
      Removed unnecessary resetting of mock times to 0 in many tests. RocksDB
      deals in relative time. Any behaviors relying on absolute date/time are likely
      a bug. (The above test DBSSTTest.GetTotalSstFilesSize was the only one
      clearly injecting a specific absolute time for actual testing convenience.) Just
      in case I misunderstood some test, I put this note in each replacement:
      // NOTE: Presumed unnecessary and removed: resetting mock time in env
      
      Strengthened some tests like MergeTestTime, MergeCompactionTimeTest, and
      FilterCompactionTimeTest in db_test.cc
      
      stats_history_test and blob_db_test are each their own beast, rather deeply
      dependent on MockTimeEnv. Each gets its own variant of a work-around for
      TimedWait in a mock time environment. (Reduces redundancy and
      inconsistency in stats_history_test.)
      
      Intended follow-up:
      
      Remove TimedWait from the public API of InstrumentedCondVar, and only
      make that accessible through Env by passing in an InstrumentedCondVar and
      a deadline. Then the Env implementations mocking time can fix this problem
      without using sync points. (Test infrastructure using sync points interferes
      with individual tests' control over sync points.)
      
      With that change, we can simplify/consolidate the scattered work-arounds.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7101
      
      Test Plan: make check on Linux and MacOS
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D23032815
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7f33967ada8b83011fb54e8279365c008bd6610b
      6ac1d25f