1. 08 5月, 2021 1 次提交
    • S
      Cap automatic arena block size to 1 MB (#7907) · a4919d6b
      sdong 提交于
      Summary:
      Larger arena block size does provide the benefit of reducing allocation overhead, however it may cause other troubles. For example, allocator is more likely not to allocate them to physical memory and trigger page fault. Weighing the risk, we cap the arena block size to 1MB. Users can always use a larger value if they want.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7907
      
      Test Plan: Run all existing tests
      
      Reviewed By: pdillinger
      
      Differential Revision: D26135269
      
      fbshipit-source-id: b7f55afd03e6ee1d8715f90fa11b6c33944e9ea8
      a4919d6b
  2. 07 5月, 2021 1 次提交
  3. 06 5月, 2021 5 次提交
    • A
      Permit stdout "fail"/"error" in whitebox crash test (#8272) · b71b4597
      Andrew Kryczka 提交于
      Summary:
      In https://github.com/facebook/rocksdb/issues/8268, the `db_stress` stdout began containing both the strings
      "fail" and "error" (case-insensitive). The whitebox crash test
      failed upon seeing either of those strings.
      
      I checked that all other occurrences of "fail" and "error"
      (case-insensitive) that `db_stress` produces are printed to `stderr`. So
      this PR separates the handling of `db_stress`'s stdout and stderr, and
      only fails when one those bad strings are found in stderr.
      
      The downside of this PR is `db_stress`'s original interleaving of stdout/stderr is not preserved in `db_crashtest.py`'s output.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8272
      
      Test Plan:
      run it; see it succeeds for several runs until encountering a real error
      
      ```
      $ python3 tools/db_crashtest.py whitebox --simple --random_kill_odd=8887 --max_key=1000000 --value_size_mult=33
      ...
      db_stress: cache/clock_cache.cc:483: bool rocksdb::{anonymous}::ClockCacheShard::Unref(rocksdb::{anonymous}::CacheHandle*, bool, rocksdb::{anonymous}::CleanupContext*): Assertion `CountRefs(flags) > 0' failed.
      
      TEST FAILED. Output has 'fail'!!!
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D28239233
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3b8602a0d570466a7e2c81bb9c49468f7716091e
      b71b4597
    • S
      db_stress: wait for compaction to finish after open with failure injection (#8270) · 7f3a0f5b
      sdong 提交于
      Summary:
      When injecting in DB open, error can happen in background threads, causing DB open succeed, but DB is soon made read-only and subsequence writes will fail, which is not expected. To prevent it from happening, wait for compaction to finish before serving the traffic. If there is a failure, reopen.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8270
      
      Test Plan: Run the test.
      
      Reviewed By: ajkr
      
      Differential Revision: D28230537
      
      fbshipit-source-id: e2e97888904f9b9bb50c35ccf95b88c2319ef5c3
      7f3a0f5b
    • S
      Refactor kill point (#8241) · e19908cb
      sdong 提交于
      Summary:
      Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241
      
      Test Plan: make release and run crash test for a while.
      
      Reviewed By: anand1976
      
      Differential Revision: D28078486
      
      fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f
      e19908cb
    • M
      Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) · 8948dc85
      mrambacher 提交于
      Summary:
      The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions.  This change cleans that up by introducing an ImmutableOptions struct.  Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form).
      
      Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR.  All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes.
      
      Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262
      
      Reviewed By: pdillinger
      
      Differential Revision: D28226540
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf
      8948dc85
    • A
      Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268) · 0f42e50f
      Andrew Kryczka 提交于
      Summary:
      See release note in HISTORY.md.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268
      
      Test Plan: unit test repro
      
      Reviewed By: siying
      
      Differential Revision: D28227901
      
      Pulled By: ajkr
      
      fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339
      0f42e50f
  4. 05 5月, 2021 4 次提交
    • P
      Fix use-after-free threading bug in ClockCache (#8261) · 3b981eaa
      Peter Dillinger 提交于
      Summary:
      In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with
      -use_clock_cache, as well as db_bench -use_clock_cache, but not
      single-threaded. Smaller cache size hits failure much faster. ASAN
      reported the failuer as calling malloc_usable_size on the `key` pointer
      of a ClockCache handle after it was reportedly freed. On detailed
      inspection I found this bad sequence of operations for a cache entry:
      
      state=InCache=1,refs=1
      [thread 1] Start ClockCacheShard::Unref (from Release, no mutex)
      [thread 1] Decrement ref count
      state=InCache=1,refs=0
      [thread 1] Suspend before CalcTotalCharge (no mutex)
      
      [thread 2] Start UnsetInCache (from Insert, mutex held)
      [thread 2] clear InCache bit
      state=InCache=0,refs=0
      [thread 2] Calls RecycleHandle (based on pre-updated state)
      [thread 2] Returns to Insert which calls Cleanup which deletes `key`
      
      [thread 1] Resume ClockCacheShard::Unref
      [thread 1] Read `key` in CalcTotalCharge
      
      To fix this, I've added a field to the handle to store the metadata
      charge so that we can efficiently remember everything we need from
      the handle in Unref. We must not read from the handle again if we
      decrement the count to zero with InCache=1, which means we don't own
      the entry and someone else could eject/overwrite it immediately.
      
      Note before this change, on amd64 sizeof(Handle) == 56 even though there
      are only 48 bytes of data. Grouping together the uint32_t fields would
      cut it down to 48, but I've added another uint32_t, which takes it
      back up to 56. Not a big deal.
      
      Also fixed DisownData to cooperate with ASAN as in LRUCache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261
      
      Test Plan:
      Manual + adding use_clock_cache to db_crashtest.py
      
      Base performance
      ./cache_bench -use_clock_cache
      Complete in 17.060 s; QPS = 2458513
      New performance
      ./cache_bench -use_clock_cache
      Complete in 17.052 s; QPS = 2459695
      
      Any difference is easily buried in small noise.
      
      Crash test shows still more bug(s) in ClockCache, so I'm expecting to
      disable ClockCache from production code in a follow-up PR (if we
      can't find and fix the bug(s))
      
      Reviewed By: mrambacher
      
      Differential Revision: D28207358
      
      Pulled By: pdillinger
      
      fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294
      3b981eaa
    • A
      Fix ConcurrentTaskLimiter token release for shutdown (#8253) · c70bae1b
      Andrew Kryczka 提交于
      Summary:
      Previously the shutdown process did not properly wait for all
      `compaction_thread_limiter` tokens to be released before proceeding to
      delete the DB's C++ objects. When this happened, we saw tests like
      "DBCompactionTest.CompactionLimiter" flake with the following error:
      
      ```
      virtual
      rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl():
      Assertion `outstanding_tasks_ == 0' failed.
      ```
      
      There is a case where a token can still be alive even after the shutdown
      process has waited for BG work to complete. In particular, this happens
      because the shutdown process only waits for flush/compaction scheduled/unscheduled counters to all
      reach zero. These counters are decremented in `BackgroundCallCompaction()`
      functions. However, tokens are released in `BGWork*Compaction()` functions, which
      actually wrap the `BackgroundCallCompaction()` function.
      
      A simple sleep could repro the race condition:
      
      ```
      $ diff --git a/db/db_impl/db_impl_compaction_flush.cc
      b/db/db_impl/db_impl_compaction_flush.cc
      index 806bc548a..ba59efa89 100644
       --- a/db/db_impl/db_impl_compaction_flush.cc
      +++ b/db/db_impl/db_impl_compaction_flush.cc
      @@ -2442,6 +2442,7 @@ void DBImpl::BGWorkCompaction(void* arg) {
             static_cast<PrepickedCompaction*>(ca.prepicked_compaction);
         static_cast_with_check<DBImpl>(ca.db)->BackgroundCallCompaction(
             prepicked_compaction, Env::Priority::LOW);
      +  sleep(1);
         delete prepicked_compaction;
       }
      
      $ ./db_compaction_test --gtest_filter=DBCompactionTest.CompactionLimiter
      db_compaction_test: util/concurrent_task_limiter_impl.cc:24: virtual rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl(): Assertion `outstanding_tasks_ == 0' failed.
      Received signal 6 (Aborted)
      #0   /usr/local/fbcode/platform007/lib/libc.so.6(gsignal+0xcf) [0x7f02673c30ff] ??      ??:0
      https://github.com/facebook/rocksdb/issues/1   /usr/local/fbcode/platform007/lib/libc.so.6(abort+0x134) [0x7f02673ac934] ??       ??:0
      ...
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8253
      
      Test Plan: sleeps to expose race conditions
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D28168064
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9e5167c74398d323e7975980c5cc00f450631160
      c70bae1b
    • A
      Deflake DBTest.L0L1L2AndUpHitCounter (#8259) · c2a3424d
      Andrew Kryczka 提交于
      Summary:
      Previously we saw flakes on platforms like arm on CircleCI, such as the following:
      
      ```
      Note: Google Test filter = DBTest.L0L1L2AndUpHitCounter
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from DBTest
      [ RUN      ] DBTest.L0L1L2AndUpHitCounter
      db/db_test.cc:5345: Failure
      Expected: (TestGetTickerCount(options, GET_HIT_L0)) > (100), actual: 30 vs 100
      [  FAILED  ] DBTest.L0L1L2AndUpHitCounter (150 ms)
      [----------] 1 test from DBTest (150 ms total)
      
      [----------] Global test environment tear-down
      [==========] 1 test from 1 test case ran. (150 ms total)
      [  PASSED  ] 0 tests.
      [  FAILED  ] 1 test, listed below:
      [  FAILED  ] DBTest.L0L1L2AndUpHitCounter
      ```
      
      The test was totally non-deterministic, e.g., flush/compaction timing would affect how many files on each level. Furthermore, it depended heavily on platform-specific details, e.g., by having a 32KB memtable, it could become full with a very different number of entries depending on the platform.
      
      This PR rewrites the test to build a deterministic LSM with one file per level.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8259
      
      Reviewed By: mrambacher
      
      Differential Revision: D28178100
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0a03b26e8d23c29d8297c1bccb1b115dce33bdcd
      c2a3424d
    • J
      Update CircleCI MacOS Xcode version to 11.3.0 (#8256) · 8a92564a
      Jay Zhuang 提交于
      Summary:
      To fix CircleCI pyenv installation failure.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8256
      
      Reviewed By: ajkr
      
      Differential Revision: D28191772
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 2bbb1d5ded473e510c11c8ed27884c4ad073973f
      8a92564a
  5. 04 5月, 2021 1 次提交
    • S
      Hint temperature of bottommost level files to FileSystem (#8222) · c3ff14e2
      sdong 提交于
      Summary:
      As the first part of the effort of having placing different files on different storage types, this change introduces several things:
      (1) An experimental interface in FileSystem that specify temperature to a new file created.
      (2) A test FileSystemWrapper,  SimulatedHybridFileSystem, that simulates HDD for a file of "warm" temperature.
      (3) A simple experimental feature ColumnFamilyOptions.bottommost_temperature. RocksDB would pass this value to FileSystem when creating any bottommost file.
      (4) A db_bench parameter that applies the (2) and (3) to db_bench.
      
      The motivation of the change is to introduce minimal changes that allow us to evolve tiered storage development.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8222
      
      Test Plan:
      ./db_bench --benchmarks=fillrandom --write_buffer_size=2000000 -max_bytes_for_level_base=20000000  -level_compaction_dynamic_level_bytes --reads=100 -compaction_readahead_size=20000000 --reads=100000 -num=10000000
      
      followed by
      
      ./db_bench --benchmarks=readrandom,stats --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -simulate_hybrid_fs_file=/tmp/warm_file_list -level_compaction_dynamic_level_bytes -compaction_readahead_size=20000000 --reads=500 --threads=16 -use_existing_db --num=10000000
      
      and see results as expected.
      
      Reviewed By: ajkr
      
      Differential Revision: D28003028
      
      fbshipit-source-id: 4724896d5205730227ba2f17c3fecb11261744ce
      c3ff14e2
  6. 01 5月, 2021 1 次提交
    • P
      Add more LSM info to FilterBuildingContext (#8246) · d2ca04e3
      Peter Dillinger 提交于
      Summary:
      Add `num_levels`, `is_bottommost`, and table file creation
      `reason` to `FilterBuildingContext`, in anticipation of more powerful
      Bloom-like filter support.
      
      To support this, added `is_bottommost` and `reason` to
      `TableBuilderOptions`, which allowed removing `reason` parameter from
      `rocksdb::BuildTable`.
      
      I attempted to remove `skip_filters` from `TableBuilderOptions`, because
      filter construction decisions should arise from options, not one-off
      parameters. I could not completely remove it because the public API for
      SstFileWriter takes a `skip_filters` parameter, and translating this
      into an option change would mean awkwardly replacing the table_factory
      if it is BlockBasedTableFactory with new filter_policy=nullptr option.
      I marked this public skip_filters option as deprecated because of this
      oddity. (skip_filters on the read side probably makes sense.)
      
      At least `skip_filters` is now largely hidden for users of
      `TableBuilderOptions` and is no longer used for implementing the
      optimize_filters_for_hits option. Bringing the logic for that option
      closer to handling of FilterBuildingContext makes it more obvious that
      hese two are using the same notion of "bottommost." (Planned:
      configuration options for Bloom-like filters that generalize
      `optimize_filters_for_hits`)
      
      Recommended follow-up: Try to get away from "bottommost level" naming of
      things, which is inaccurate (see
      VersionStorageInfo::RangeMightExistAfterSortedRun), and move to
      "bottommost run" or just "bottommost."
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8246
      
      Test Plan:
      extended an existing unit test to exercise and check various
      filter building contexts. Also, existing tests for
      optimize_filters_for_hits validate some of the "bottommost" handling,
      which is now closely connected to FilterBuildingContext::is_bottommost
      through TableBuilderOptions::is_bottommost
      
      Reviewed By: mrambacher
      
      Differential Revision: D28099346
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2c1072e29c24d4ac404c761a7b7663292372600a
      d2ca04e3
  7. 29 4月, 2021 5 次提交
    • P
      Refactor: use TableBuilderOptions to reduce parameter lists (#8240) · 85becd94
      Peter Dillinger 提交于
      Summary:
      Greatly reduced the not-quite-copy-paste giant parameter lists
      of rocksdb::NewTableBuilder, rocksdb::BuildTable,
      BlockBasedTableBuilder::Rep ctor, and BlockBasedTableBuilder ctor.
      
      Moved weird separate parameter `uint32_t column_family_id` of
      TableFactory::NewTableBuilder into TableBuilderOptions.
      
      Re-ordered parameters to TableBuilderOptions ctor, so that `uint64_t
      target_file_size` is not randomly placed between uint64_t timestamps
      (was easy to mix up).
      
      Replaced a couple of fields of BlockBasedTableBuilder::Rep with a
      FilterBuildingContext. The motivation for this change is making it
      easier to pass along more data into new fields in FilterBuildingContext
      (follow-up PR).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8240
      
      Test Plan: ASAN make check
      
      Reviewed By: mrambacher
      
      Differential Revision: D28075891
      
      Pulled By: pdillinger
      
      fbshipit-source-id: fddb3dbb8260a0e8bdcbb51b877ebabf9a690d4f
      85becd94
    • A
      Improve BlockPrefetcher to prefetch only for sequential scans (#7394) · a0e0feca
      Akanksha Mahajan 提交于
      Summary:
      BlockPrefetcher is used by iterators to prefetch data if they
      anticipate more data to be used in future and this is valid for forward sequential
      scans. But BlockPrefetcher tracks only num_file_reads_ and not if reads
      are sequential. This presents problem for MultiGet with large number of
      keys when it reseeks index iterator and data block. FilePrefetchBuffer
      can end up doing large readahead for reseeks as readahead size
      increases exponentially once readahead is enabled. Same issue is with
      BlockBasedTableIterator.
      
      Add previous length and offset read as well in BlockPrefetcher (creates
      FilePrefetchBuffer) and FilePrefetchBuffer (does prefetching of data) to
      determine if reads are sequential and then  prefetch.
      
      Update the last block read after cache hit to take reads from cache also
      in account.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7394
      
      Test Plan: Add new unit test case
      
      Reviewed By: anand1976
      
      Differential Revision: D23737617
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 8e6917c25ed87b285ee495d1b68dc623d71205a3
      a0e0feca
    • A
      Fix a memory leak in c_test (#8237) · 0db4cde6
      anand76 提交于
      Summary:
      Don't call ```rocksdb_cache_disown_data()``` as it causes the memory allocated for ```shards_``` to be leaked.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8237
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28039061
      
      Pulled By: anand1976
      
      fbshipit-source-id: c3464efe2c006b93b4be87030116a12a124598c4
      0db4cde6
    • A
      Change CircleCI Windows to previous known good image (#8220) · 8fe33a0a
      anand76 提交于
      Summary:
      This is to try to resolve the VS2015 install failure in CircleCI Windows builds.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8220
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28061834
      
      Pulled By: anand1976
      
      fbshipit-source-id: b2663eb60babee603669a2c2cb55f182df1cc7b1
      8fe33a0a
    • S
      db_stress to add --open_metadata_write_fault_one_in (#8235) · cde69a7c
      sdong 提交于
      Summary:
      DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place.
      If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully.
      This option is enabled in crash test in half of the time.
      Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235
      
      Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory.
      
      Reviewed By: anand1976
      
      Differential Revision: D28010944
      
      fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558
      cde69a7c
  8. 28 4月, 2021 3 次提交
  9. 27 4月, 2021 3 次提交
    • M
      Rename variables in ImmutableCFOptions to avoid conflicts with ImmutableDBOptions (#8227) · 0ca6d629
      mrambacher 提交于
      Summary:
      Renaming ImmutableCFOptions::info_log and statistics to logger and stats.  This is stage 2 in creating an ImmutableOptions class.  It is necessary because the names match those in ImmutableOptions and have different types.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8227
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28000967
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 3bf2aa04e8f1e8724d825b7deacf41080c14420b
      0ca6d629
    • M
      Fix cast-function-type warning (#8230) · c2c7d5e9
      Mr-Leshiy 提交于
      Summary:
      Fixing cast-function-type which is appears during the following build:
      ```bash
      cmake ..  -DFAIL_ON_WARNINGS=ON -DCMAKE_C_COMPILER=x86_64-w64-mingw32-gcc -DCMAKE_CXX_COMPILER=x86_64-w64-mingw32-g++ -DCMAKE_SYSTEM_NAME=Windows
      make rocksdb
      ```
      Here is the log:
      ```
      /home/leshiy/Work/rocksdb/port/win/env_win.cc: In constructor ‘rocksdb::port::WinClock::WinClock()’:
      /home/leshiy/Work/rocksdb/port/win/env_win.cc:92:9: error: cast between incompatible function types from ‘FARPROC’ {aka ‘long long int (*)()’} to ‘rocksdb::port::WinClock::FnGetSystemTimePreciseAsFileTime’ {aka ‘void (*)(_FILETIME*)’} [-Werror=cast-function-type]
         92 |         (FnGetSystemTimePreciseAsFileTime)GetProcAddress(
            |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         93 |             module, "GetSystemTimePreciseAsFileTime");
            |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      cc1plus: all warnings being treated as errors
      make[2]: *** [CMakeFiles/rocksdb.dir/build.make:4337: CMakeFiles/rocksdb.dir/port/win/env_win.cc.obj] Error 1
      make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/rocksdb.dir/all] Error 2
      make: *** [Makefile:91: all] Error 2
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8230
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D28000215
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 874782cf48f70470e3fbd9097585bf42e810ca61
      c2c7d5e9
    • A
      WBWI Internal Move implementation from .h into .cpp (#8229) · 2760c2ae
      Adam Retter 提交于
      Summary:
      Moves some of the structural refactoring from https://github.com/facebook/rocksdb/pull/8135 into this PR.
      This just cleans up the code by moving implementation out of the .h file and into the .cc file.
      
      Should be considered for merge before both https://github.com/facebook/rocksdb/pull/7214 and https://github.com/facebook/rocksdb/pull/8135
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8229
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D27999669
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6eccecbf1f11bb9f5a173e86d1e7bc448bc96071
      2760c2ae
  10. 26 4月, 2021 2 次提交
  11. 24 4月, 2021 1 次提交
    • S
      Eliminate double-buffering of keys in block_based_table_builder (#8219) · cc1c3ee5
      Saketh Are 提交于
      Summary:
      The block_based_table_builder buffers some blocks in memory to construct a good compression dictionary. Before this commit, the keys from each block were buffered separately for convenience. However, the buffered block data implicitly contains all keys. This commit eliminates the redundant key buffers and reduces memory usage.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8219
      
      Reviewed By: ajkr
      
      Differential Revision: D27945851
      
      Pulled By: saketh-are
      
      fbshipit-source-id: caf3cac1217201e080a1e24b542bedf20973afee
      cc1c3ee5
  12. 23 4月, 2021 5 次提交
    • S
      Expose JemallocNodumpAllocator to C API (#8178) · d65d7d65
      Sahir Hoda 提交于
      Summary:
      Add new C APIs to create the JemallocNodumpAllocator and set it on a Cache object.
      
      `make test` passes with and without `DISABLE_JEMALLOC=1`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8178
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D27944631
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2531729aa285a8985c58f22f093c4d53029c4a7b
      d65d7d65
    • M
      Make types of Immutable/Mutable Options fields match that of the underlying Option (#8176) · 01e460d5
      mrambacher 提交于
      Summary:
      This PR is a first step at attempting to clean up some of the Mutable/Immutable Options code.  With this change, a DBOption and a ColumnFamilyOption can be reconstructed from their Mutable and Immutable equivalents, respectively.
      
      readrandom tests do not show any performance degradation versus master (though both are slightly slower than the current 6.19 release).
      
      There are still fields in the ImmutableCFOptions that are not CF options but DB options.  Eventually, I would like to move those into an ImmutableOptions (= ImmutableDBOptions+ImmutableCFOptions).  But that will be part of a future PR to minimize changes and disruptions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8176
      
      Reviewed By: pdillinger
      
      Differential Revision: D27954339
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ec6b805ba9afe6e094bffdbd76246c2d99aa9fad
      01e460d5
    • J
      Add internal compaction API for Secondary instance (#8171) · f0fca2b1
      Jay Zhuang 提交于
      Summary:
      Add compaction API for secondary instance, which compact the files to a secondary DB path without installing to the LSM tree.
      The API will be used to remote compaction.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8171
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D27694545
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 8ff3ec1bffdb2e1becee994918850c8902caf731
      f0fca2b1
    • H
      Add ZenFS to plugin list (#8218) · e85d8a65
      Hans Holmberg 提交于
      Summary:
      Add ZenFS, a file system for zoned block devices, to PLUGINS.md
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8218
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D27944376
      
      Pulled By: ajkr
      
      fbshipit-source-id: c9ea2e9814001ccd7c56d7ef4d38e20dfeb48d1e
      e85d8a65
    • Z
      Fix the false positive alert of CF consistency check in WAL recovery (#8207) · 09a9ec3a
      Zhichao Cao 提交于
      Summary:
      In current RocksDB, in recover the information form WAL, we do the consistency check for each column family when one WAL file is corrupted and PointInTimeRecovery is set. However, it will report a false positive alert on "SST file is ahead of WALs" when one of the CF current log number is greater than the corrupted WAL number (CF contains the data beyond the corrupted WAl) due to a new column family creation during flush. In this case, a new WAL is created (it is empty) during a flush. Also, due to some reason (e.g., storage issue or crash happens before SyncCloseLog is called), the old WAL is corrupted. The new CF has no data, therefore, it does not have the consistency issue.
      
      Fix: when checking cfd->GetLogNumber() > corrupted_wal_number also check cfd->GetLiveSstFilesSize() > 0. So the CFs with no SST file data will skip the check here.
      
      Note potential ignored inconsistency caused due to fix: empty CF can also be caused by write+delete. In this case, after flush, there is no SST files being generated. However, this CF still have the log in the WAL. When the WAL is corrupted, the DB might be inconsistent.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8207
      
      Test Plan: added unit test, make crash_test
      
      Reviewed By: riversand963
      
      Differential Revision: D27898839
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 931fc2d8b92dd00b4169bf84b94e712fd688a83e
      09a9ec3a
  13. 22 4月, 2021 4 次提交
    • M
      Add check to cmake to see if we need to link against -latomic (#8183) · 47b424f4
      mrambacher 提交于
      Summary:
      For some compilers/environments (e.g. Clang, riscv64), we need to link against -latomic.  Check if this is a requirement and add the library to the third-party libs if it is.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8183
      
      Reviewed By: pdillinger
      
      Differential Revision: D27773564
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 68e15d823144f83fb02221c7bf5b1e43323419bf
      47b424f4
    • Y
      Ignore comparator name mismatch in ldb manifest dump (#8216) · 31435276
      Yanqin Jin 提交于
      Summary:
      RocksDB allows user-specified custom comparators which may not be known to `ldb`,
      a built-in tool for checking/mutating the database. Therefore, column family comparator
      names mismatch encountered during manifest dump should not prevent the dumping from
      proceeding.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8216
      
      Test Plan:
      ```
      make check
      ```
      
      Also manually do the following
      ```
      KEEP_DB=1 ./db_with_timestamp_basic_test
      ./ldb --db=<db> manifest_dump --verbose
      ```
      The ldb should succeed and print something like:
      ```
      ...
      --------------- Column family "default"  (ID 0) --------------
      log number: 6
      comparator: <TestComparator>, but the comparator object is not available.
      ...
      ```
      
      Reviewed By: ltamasi
      
      Differential Revision: D27927581
      
      Pulled By: riversand963
      
      fbshipit-source-id: f610b2c842187d17f575362070209ee6b74ec6d4
      31435276
    • S
      Add comment to DisableManualCompaction() (#8186) · 4985cea1
      sdong 提交于
      Summary:
      Add comment to DisableManualCompaction() which was missing.
      Also explictly return from DBImpl::CompactRange() to avoid memtable flush when manual compaction is disabled.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8186
      
      Test Plan: Run existing unit tests.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D27744517
      
      fbshipit-source-id: 449548a48905903b888dc9612bd17480f6596a71
      4985cea1
    • A
      Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) · 596e9008
      Akanksha Mahajan 提交于
      Summary:
      When WriteBufferManager is shared across DBs and column families
      to maintain memory usage under a limit, OOMs have been observed when flush cannot
      finish but writes continuously insert to memtables.
      In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
      this change will stall incoming writers until flush is completed and memory_usage
      drops.
      
      Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
      (memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.
      
      DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
      (which adds dummy stall object to the writer's queue).
      Then DB is blocked on a state State::Blocked (current write doesn't go
      through). WBStallInterface object maintained by every DB instance is added to the queue of
      WriteBufferManager.
      
      If multiple DBs tries to write during this stall, they will also be
      blocked when check WriteBufferManager::ShouldStall() returns true.
      
      End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
      waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
      to complete and avoid continous stalling if memory usage remains close to buffer_size.
      
      WriterBufferManager::EndWriteStall() is called,
      which removes all instances from its queue and signal them to continue.
      Their state is changed to State::Running and they are unblocked. DBImpl
      then signal all incoming writers of that DB to continue by calling
      write_thread_.EndWriteStall() (which removes dummy stall object from the
      queue).
      
      DB instance creates WBMStallInterface which is an interface to block and
      signal DBs during stall.
      When DB needs to be blocked or signalled by WriteBufferManager,
      state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898
      
      Test Plan: Added a new test db/db_write_buffer_manager_test.cc
      
      Reviewed By: anand1976
      
      Differential Revision: D26093227
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae
      596e9008
  14. 21 4月, 2021 4 次提交