1. 05 1月, 2022 1 次提交
  2. 31 12月, 2021 1 次提交
    • Y
      Fix a bug in C-binding causing iterator to return incorrect result (#9343) · 677d2b4a
      Yanqin Jin 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/9339
      
      When writing SST file, the name, computed as `prefix_extractor->GetId()` will be written to the properties block.
      When the SST is opened again in the future, `CreateFromString()` will take the name as argument and try
      to create a prefix extractor object. Without this fix, the C API will pass a `Wrapper` pointer to the underlying
      DB's `prefix_extractor`. `Wrapper::GetId()`, in this case, will be missing the prefix length component, causing a
      prefix extractor of length 0 to be silently created and used.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9343
      
      Test Plan:
      ```
      make c_test
      ./c_test
      ```
      
      Reviewed By: mrambacher
      
      Differential Revision: D33355549
      
      Pulled By: riversand963
      
      fbshipit-source-id: c92c3acd8be262c3bff8794b4229e42b9ee31203
      677d2b4a
  3. 30 12月, 2021 1 次提交
    • S
      Improve SimulatedHybridFileSystem (#9301) · a931bacf
      sdong 提交于
      Summary:
      Several improvements to SimulatedHybridFileSystem:
      (1) Allow a mode where all I/Os to all files simulate HDD. This can be enabled in db_bench using -simulate_hdd
      (2) Latency calculation is slightly more accurate
      (3) Allow to simulate more than one HDD spindles.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9301
      
      Test Plan: Run db_bench and observe the results are reasonable.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33141662
      
      fbshipit-source-id: b736e58c4ba910d06899cc9ccec79b628275f4fa
      a931bacf
  4. 29 12月, 2021 5 次提交
    • M
      Remove/Reduce use of Regex in ObjectRegistry/Library (#9264) · 1c39b795
      mrambacher 提交于
      Summary:
      Added new ObjectLibrary::Entry classes to replace/reduce the use of Regex.  For simple factories that only do name matching, there are "StringEntry" and "AltStringEntry" classes.  For classes that use some semblance of regular expressions, there is a PatternEntry class that can match a name and prefixes.  There is also a class for Customizable::IndividualId format matches.
      
      Added tests for the new derivative classes and got all unit tests to pass.
      
      Resolves https://github.com/facebook/rocksdb/issues/9225.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9264
      
      Reviewed By: pdillinger
      
      Differential Revision: D33062001
      
      Pulled By: mrambacher
      
      fbshipit-source-id: c2d2143bd2d38bdf522705c8280c35381b135c03
      1c39b795
    • M
      Change GTEST_SKIP to BYPASS for MemoryAllocatorTest (#9340) · 0a563ae2
      mrambacher 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9340
      
      Reviewed By: riversand963
      
      Differential Revision: D33344152
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 283637625b86c33497571c5f52cac3ddf910b6f3
      0a563ae2
    • P
      New blog post for Ribbon filter (#8992) · 26a238f5
      Peter Dillinger 提交于
      Summary:
      new blog post for Ribbon filter
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8992
      
      Test Plan: markdown render in GitHub, Pages on my fork
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33342496
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a0a7c19100abdf8755f8a618eb4dead755dfddae
      26a238f5
    • A
      Added `TraceOptions::preserve_write_order` (#9334) · aa2b3bf6
      Andrew Kryczka 提交于
      Summary:
      This option causes trace records to be written in the serialized write thread. That way, the write records in the trace must follow the same order as writes that are logged to WAL and writes that are applied to the DB.
      
      By default I left it disabled to match existing behavior. I enabled it in `db_stress`, though, as that use case requires order of write records in trace matches the order in WAL.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9334
      
      Test Plan:
      - See if below unsynced data loss crash test can run  for 24h straight. It used to crash after a few hours when reaching an unlucky trace ordering.
      
      ```
      DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm /usr/local/bin/python3 -u tools/db_crashtest.py blackbox --interval=10 --max_key=100000 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152 --value_size_mult=33 --sync_fault_injection=1 --test_batches_snapshots=0 --duration=86400
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33301990
      
      Pulled By: ajkr
      
      fbshipit-source-id: 82d97559727adb4462a7af69758449c8725b22d3
      aa2b3bf6
    • A
      Extend trace filtering to more operation types (#9335) · 2ee20a66
      Andrew Kryczka 提交于
      Summary:
      - Extended trace filtering to cover `MultiGet()`, `Seek()`, and `SeekForPrev()`. Now all user ops that can be traced support filtering.
      - Enabled the new filter masks in `db_stress` since it only cares to trace writes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9335
      
      Test Plan:
      - trace-heavy `db_stress` command reduced 30% elapsed time  (79.21 -> 55.47 seconds)
      
      Benchmark command:
      ```
      $ /usr/bin/time ./db_stress -ops_per_thread=100000 -sync_fault_injection=1 --db=/dev/shm/rocksdb_stress_db/ --expected_values_dir=/dev/shm/rocksdb_stress_expected/ --clear_column_family_one_in=0
      ```
      
      - replay-heavy `db_stress` command reduced 12.4% elapsed time (23.69 -> 20.75 seconds)
      
      Setup command:
      ```
      $  ./db_stress -ops_per_thread=100000000 -sync_fault_injection=1 -db=/dev/shm/rocksdb_stress_db/ -expected_values_dir=/dev/shm/rocksdb_stress_expected --clear_column_family_one_in=0 & sleep 120; pkill -9 db_stress
      ```
      
      Benchmark command:
      ```
      $ /usr/bin/time ./db_stress -ops_per_thread=1 -reopen=0 -expected_values_dir=/dev/shm/rocksdb_stress_expected/ -db=/dev/shm/rocksdb_stress_db/ --clear_column_family_one_in=0 --destroy_db_initially=0
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33304580
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0df10f87c1fc506e9484b6b42cea2ef96c7ecd65
      2ee20a66
  5. 24 12月, 2021 1 次提交
    • S
      Make IncreaseFullHistoryTsLow to a public API (#9221) · 2e5f7642
      slk 提交于
      Summary:
      As (https://github.com/facebook/rocksdb/issues/9210) discussed, the **full_history_ts_low** is a member of CompactRangeOptions currently, which means a CF's fullHistoryTsLow is advanced only when users submit a CompactRange request.
      However, users may want to advance the fllHistoryTsLow without an immediate compact.
      This merge make IncreaseFullHistoryTsLow to a public API so users can advance each CF's fullHistoryTsLow seperately.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9221
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D33201106
      
      Pulled By: riversand963
      
      fbshipit-source-id: 9cb1d013ba93260f72e16353e693ffee167b47ee
      2e5f7642
  6. 23 12月, 2021 5 次提交
    • A
      Fix race condition in BackupEngineTest.ChangeManifestDuringBackupCreation (#9327) · 538d2365
      Andrew Kryczka 提交于
      Summary:
      The failure looked like this:
      
      ```
      utilities/backupable/backupable_db_test.cc:3161: Failure
      Value of: db_chroot_env_->FileExists(prev_manifest_path).IsNotFound()
        Actual: false
      Expected: true
      ```
      
      The failure could be coerced consistently with the following patch:
      
      ```
       diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc
      index 80410f671..637636791 100644
       --- a/db/db_impl/db_impl_compaction_flush.cc
      +++ b/db/db_impl/db_impl_compaction_flush.cc
      @@ -2772,6 +2772,8 @@ void DBImpl::BackgroundCallFlush(Env::Priority thread_pri) {
           if (job_context.HaveSomethingToClean() ||
               job_context.HaveSomethingToDelete() || !log_buffer.IsEmpty()) {
             mutex_.Unlock();
      +      bg_cv_.SignalAll();
      +      sleep(1);
             TEST_SYNC_POINT("DBImpl::BackgroundCallFlush:FilesFound");
             // Have to flush the info logs before bg_flush_scheduled_--
             // because if bg_flush_scheduled_ becomes 0 and the lock is
      ```
      
      The cause was a familiar problem, which is manual flush/compaction may
      return before files they obsoleted are removed. The solution is just to
      wait for "scheduled" work to complete, which includes all phases
      including cleanup.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9327
      
      Test Plan:
      after this PR, even the above patch to coerce the bug cannot
      cause the test to fail.
      
      Reviewed By: riversand963
      
      Differential Revision: D33252208
      
      Pulled By: ajkr
      
      fbshipit-source-id: 720a7eaca58c7247d221911fffe3d5e1dbf581e9
      538d2365
    • S
      Expose locktree's wait count in RangeLockManagerHandle::Counters (#9289) · 1b076e82
      Sergei Petrunia 提交于
      Summary:
      locktree is a module providing Range Locking. It has a counter for
      the number of times a lock acquisition request was blocked by an
      existing conflicting lock and had to wait for it to be released.
      
      Expose this counter in RangeLockManagerHandle::Counters::lock_wait_count.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9289
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33079182
      
      Pulled By: riversand963
      
      fbshipit-source-id: 25b1a362d9da247536ab5007bd15900b319f139e
      1b076e82
    • A
      Filter `Get()`s from `db_stress` traces (#9315) · dfff1cec
      Andrew Kryczka 提交于
      Summary:
      `db_stress` traces are used for tracking unsynced changes. For that purpose, we
      only need to track writes and not reads. Currently `TraceOptions` only
      supports excluding `Get()`s from the trace, so this PR only excludes
      `Get()`s. In the future it would be good to exclude `MultiGet()`s and
      iterator operations too.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9315
      
      Test Plan:
      - trace-heavy `db_stress` command elapsed time reduced 37%
      
      Benchmark:
      ```
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_stress -ops_per_thread=100000 -sync_fault_injection=1 -expected_values_dir=/dev/shm/dbstress_expected --clear_column_family_one_in=0
      ```
      
      - replay-heavy `db_stress` command elapsed time reduced 38%
      
      Setup:
      ```
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_stress -ops_per_thread=100000000 -sync_fault_injection=1 -expected_values_dir=/dev/shm/dbstress_expected --clear_column_family_one_in=0 & sleep 120; pkill -9 db_stress
      ```
      Benchmark:
      ```
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_stress -ops_per_thread=1 -reopen=0 -expected_values_dir=/dev/shm/dbstress_expected --clear_column_family_one_in=0 --destroy_db_initially=0
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33229900
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0e4251c674d236ddbc4548e9bbfdd608bf3cdc93
      dfff1cec
    • A
      Fixes for building RocksJava builds on s390x (#9321) · 65996dd7
      Adam Retter 提交于
      Summary:
      * Added Docker build environment for RocksJava on s390x
      * Cache alignment size for s390x was incorrectly calculated on gcc 6.4.0
      * Tighter control over which installed version of Java is used is required - build now correctly adheres to `JAVA_HOME` if it is set
      * Alpine build scripts should be used on Alpine (previously CentOS script worked by falling through to minimal gcc version)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9321
      
      Reviewed By: mrambacher
      
      Differential Revision: D33259624
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: d791a5150581344925c3c3f9cbb9a3622d63b3b6
      65996dd7
    • A
      Enable core dumps in ASAN crash tests (#9330) · 2d3c626b
      Andrew Kryczka 提交于
      Summary:
      There are some crashes we couldn't debug or repro and couldn't find a core dump. For ASAN the default is `disable_coredump=1` as the doc mentions core dumps can be 16TB+. However I've tried generating them for our `db_stress` commands and they've been in the 1.4-1.6GB range, which is fine. So we can try enabling it in CI.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9330
      
      Test Plan:
      - create a test job. (It's internal infra so I put the link in the Phabricator test plan only)
      - ran the same command locally, `kill -6 $(pidof db_stress)`, verified core dump showed up
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33271841
      
      Pulled By: ajkr
      
      fbshipit-source-id: 93b853fa763d5708d078771960ba36854c4be55a
      2d3c626b
  7. 22 12月, 2021 3 次提交
    • A
      Fix a bug that occur when plugin pkg-config requirements are empty (#9238) · 2e51b33d
      Andreas Hindborg 提交于
      Summary:
      Fix a bug introduced by https://github.com/facebook/rocksdb/issues/9198. The bug is triggered when a plugin does not provide any pkg-config requirements.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9238
      
      Reviewed By: riversand963
      
      Differential Revision: D32771406
      
      Pulled By: ajkr
      
      fbshipit-source-id: 79301871a8bf4e624d5e5eb9d219d7f13948c64d
      2e51b33d
    • A
      More asserts in listener_test for debuggability (#9320) · 393fc231
      Andrew Kryczka 提交于
      Summary:
      We ran into a flake I could not debug so instead added assertions in
      case it happens again.
      
      Command was:
      
      ```
      TEST_TMPDIR=/dev/shm/rocksdb COMPILE_WITH_UBSAN=1 USE_CLANG=1 OPT=-g SKIP_FORMAT_BUCK_CHECKS=1 make J=80 -j80 ubsan_check
      ```
      
      Failure output was:
      
      ```
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from EventListenerTest
      [ RUN      ] EventListenerTest.DisableBGCompaction
      UndefinedBehaviorSanitizer:DEADLYSIGNAL
      ==1558126==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000031 (pc 0x7fd9c04dda22 bp 0x7fd9bf8aa580 sp 0x7fd9bf8aa540 T1558147)
      ==1558126==The signal is caused by a READ memory access.
      ==1558126==Hint: address points to the zero page.
          #0 0x7fd9c04dda21 in __dynamic_cast /home/engshare/third-party2/libgcc/9.x/src/gcc-9.x/x86_64-facebook-linux/libstdc++-v3/libsupc++/../../.././libstdc++-v3/libsupc++/dyncast.cc:49:3
          https://github.com/facebook/rocksdb/issues/1 0x510c53 in __ubsan::checkDynamicType(void*, void*, unsigned long) (/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/listener_test+0x510c53)
          https://github.com/facebook/rocksdb/issues/2 0x50fb32 in HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) (/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/listener_test+0x50fb32)
          https://github.com/facebook/rocksdb/issues/3 0x510230 in __ubsan_handle_dynamic_type_cache_miss_abort (/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/listener_test+0x510230)
          https://github.com/facebook/rocksdb/issues/4 0x63221a in rocksdb::ColumnFamilyHandleImpl* rocksdb::static_cast_with_check<rocksdb::ColumnFamilyHandleImpl, rocksdb::ColumnFamilyHandle>(rocksdb::ColumnFamilyHandle*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/./util/cast_util.h:19:20
          https://github.com/facebook/rocksdb/issues/5 0x71cafa in rocksdb::DBImpl::TEST_GetFilesMetaData(rocksdb::ColumnFamilyHandle*, std::vector<std::vector<rocksdb::FileMetaData, std::allocator<rocksdb::FileMetaData> >, std::allocator<std::vector<rocksdb::FileMetaData, std::allocator<rocksdb::FileMetaData> > > >*, std::vector<std::shared_ptr<rocksdb::BlobFileMetaData>, std::allocator<std::shared_ptr<rocksdb::BlobFileMetaData> > >*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_debug.cc:63:14
          https://github.com/facebook/rocksdb/issues/6 0x53f6b4 in rocksdb::TestFlushListener::OnFlushCompleted(rocksdb::DB*, rocksdb::FlushJobInfo const&) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/listener_test.cc:277:24
          https://github.com/facebook/rocksdb/issues/7 0x6e2f7d in rocksdb::DBImpl::NotifyOnFlushCompleted(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, std::__cxx11::list<std::unique_ptr<rocksdb::FlushJobInfo, std::default_delete<rocksdb::FlushJobInfo> >, std::allocator<std::unique_ptr<rocksdb::FlushJobInfo, std::default_delete<rocksdb::FlushJobInfo> > > >*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:863:19
          https://github.com/facebook/rocksdb/issues/8 0x6e1074 in rocksdb::DBImpl::FlushMemTableToOutputFile(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, bool*, rocksdb::JobContext*, rocksdb::SuperVersionContext*, std::vector<unsigned long, std::allocator<unsigned long> >&, unsigned long, rocksdb::SnapshotChecker*, rocksdb::LogBuffer*, rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:314:5
          https://github.com/facebook/rocksdb/issues/9 0x6e3412 in rocksdb::DBImpl::FlushMemTablesToOutputFiles(rocksdb::autovector<rocksdb::DBImpl::BGFlushArg, 8ul> const&, bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:359:14
          https://github.com/facebook/rocksdb/issues/10 0x700df6 in rocksdb::DBImpl::BackgroundFlush(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::FlushReason*, rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2703:14
          https://github.com/facebook/rocksdb/issues/11 0x6fe1f0 in rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2742:16
          https://github.com/facebook/rocksdb/issues/12 0x6fc732 in rocksdb::DBImpl::BGWorkFlush(void*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2569:44
          https://github.com/facebook/rocksdb/issues/13 0xb3a820 in void std::_Bind<void (* (void*))(void*)>::operator()<void>() /mnt/gvfs/third-party2/libgcc/4959b39cfbe5965a37c861c4c327fa7c5c759b87/9.x/platform009/9202ce7/include/c++/9.x/functional:482:17
          https://github.com/facebook/rocksdb/issues/14 0xb3a820 in std::_Function_handler<void (), std::_Bind<void (* (void*))(void*)> >::_M_invoke(std::_Any_data const&) /mnt/gvfs/third-party2/libgcc/4959b39cfbe5965a37c861c4c327fa7c5c759b87/9.x/platform009/9202ce7/include/c++/9.x/bits/std_function.h:300:2
          https://github.com/facebook/rocksdb/issues/15 0xb347cc in rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/util/threadpool_imp.cc:266:5
          https://github.com/facebook/rocksdb/issues/16 0xb34a2f in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/util/threadpool_imp.cc:307:7
          https://github.com/facebook/rocksdb/issues/17 0x7fd9c051a660 in execute_native_thread_routine /home/engshare/third-party2/libgcc/9.x/src/gcc-9.x/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:80:18
          https://github.com/facebook/rocksdb/issues/18 0x7fd9c041e20b in start_thread /home/engshare/third-party2/glibc/2.30/src/glibc-2.30/nptl/pthread_create.c:479:8
          https://github.com/facebook/rocksdb/issues/19 0x7fd9c01dd16e in clone /home/engshare/third-party2/glibc/2.30/src/glibc-2.30/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9320
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33242185
      
      Pulled By: ajkr
      
      fbshipit-source-id: 741984b10a610e0509e0d4e54c42cdbac03f5285
      393fc231
    • M
      Add NewMetaDataIterator method (#8692) · 9a116ab4
      mrambacher 提交于
      Summary:
      Fixes a problem where the iterator for metadata was being treated as a non-user key when in fact it was a user key.  This led to a problem where the property keys could not be searched for correctly.
      
      The main exposure of this problem was that the HashIndexReader could not get the "prefixes" property correctly, resulting in the failure of retrieval/creation of the BlockPrefixIndex.
      
      Added BlockBasedTableTest.SeekMetaBlocks test to validate this condition.
      
      Fixing this condition exposed two other tests (SeekWithPrefixLongerThanKey, MultiGetPrefixFilter) that passed incorrectly previously and now failed.  Updated those two tests to pass.  Not sure if the tests are functionally correct/still appropriate, but made them pass...
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8692
      
      Reviewed By: riversand963
      
      Differential Revision: D33119539
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 658969fe9265f73dc184dab97cc3f4eaed2d881a
      9a116ab4
  8. 21 12月, 2021 6 次提交
    • S
      Minor Javadoc fixes (#9203) · 7ae213f7
      stefan-zobel 提交于
      Summary:
      Added two missing parameter tags with description and added some descriptions for parameter / return tags
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9203
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32990607
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 10aea4c4cf1c28d5e97d19722ee835a965d1eb55
      7ae213f7
    • A
      db_stress print hex key for MultiGet() inconsistency (#9324) · 82670fb1
      Andrew Kryczka 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9324
      
      Reviewed By: riversand963
      
      Differential Revision: D33248178
      
      Pulled By: ajkr
      
      fbshipit-source-id: c8a7382ed613f9ac3a0a2e3fa7d3c6fe9c95ef85
      82670fb1
    • A
      Fix race condition in `error_handler_fs_test` (#9325) · 782fcc44
      Andrew Kryczka 提交于
      Summary:
      We saw the below assertion failure in `error_handler_fs_test`:
      
      ```
      db/error_handler_fs_test.cc:2471: Failure
      Expected equality of these values:
        listener->new_bg_error()
          Which is: 16-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>
        Status::Aborted()
          Which is: 16-byte object <0A-00 00-00 60-61 00-00 00-00 00-00 00-00 00-00>
      terminate called after throwing an instance of 'testing::internal::GoogleTestFailureException'
        what():  db/error_handler_fs_test.cc:2471: Failure
      Expected equality of these values:
        listener->new_bg_error()
          Which is: 16-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>
        Status::Aborted()
          Which is: 16-byte object <0A-00 00-00 60-61 00-00 00-00 00-00 00-00 00-00>
      Received signal 6 (Aborted)
      ```
      
      The problem was completing `OnErrorRecoveryCompleted()` would
      wake up the main thread and allow it to proceed to that assertion. But
      that assertion assumes `OnErrorRecoveryEnd()` has completed since
      only `OnErrorRecoveryEnd()` affects `new_bg_error()`.
      
      The fix is just to make `OnErrorRecoveryCompleted()` not wake up the
      main thread, by means of not implementing it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9325
      
      Test Plan:
      - ran `while TEST_TMPDIR=/dev/shm ./error_handler_fs_test ; do : ; done` for a while
      - injected sleep between `OnErrorRecovery{Completed,End}()` callbacks, which guaranteed repro before this PR
      
      Reviewed By: anand1976
      
      Differential Revision: D33249200
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1659ee183cd09f90d4dbd898f65103473fcf84a8
      782fcc44
    • A
      `db_stress` tolerate incomplete tail records in trace file (#9316) · b448b712
      Andrew Kryczka 提交于
      Summary:
      I saw the following error when running crash test for a while with
      unsynced data loss:
      
      ```
      Error restoring historical expected values: Corruption: Corrupted trace file.
      ```
      
      The trace file turned out to have an incomplete tail record. This is
      normal considering blackbox kills `db_stress` while trace can be
      ongoing.
      
      In the case where the trace file is not otherwise corrupted, there
      should be enough records already seen to sync up the expected state with
      the recovered DB. This PR ignores any `Status::Corruption` the
      `Replayer` returns when that happens.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9316
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33230579
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9814af4e39e57f00d85be7404363211762f9b41b
      b448b712
    • A
      Fix race condition in db_stress thread setup (#9314) · 791723c1
      Andrew Kryczka 提交于
      Summary:
      We need to grab `SharedState`'s mutex while calling `IncThreads()` or `IncBgThreads()`. Otherwise the newly launched threads can simultaneously access the thread counters to check if every thread has finished initializing.
      
      Repro command:
      
      ```
      $ rm -rf /dev/shm/rocksdb/rocksdb_crashtest_{whitebox,expected}/ && mkdir -p /dev/shm/rocksdb/rocksdb_crashtest_{whitebox,expected}/ && ./db_stress --acquire_snapshot_one_in=10000 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=131.8094496796033 --bottommost_compression_type=zlib --cache_index_and_filter_blocks=1 --cache_size=1048576 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_style=1 --compaction_ttl=0 --compression_max_dict_buffer_bytes=134217727 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox --db_write_buffer_size=8388608 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --fail_if_options_file_error=1 --file_checksum_impl=crc32c --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=15 --index_type=3 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --log2_keys_per_lock=22 --long_running_snapshots=0 --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=4194304 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=20000 --optimize_filters_for_memory=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefixpercent=5 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=45 --recycle_log_file_num=1 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=1 --test_cf_consistency=1 --top_level_index_pinning=0 --unpartitioned_pinning=0 --use_block_based_filter=1 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=0 --use_multiget=1 --user_timestamp_size=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=1048576 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35
      ```
      
      TSAN error:
      
      ```
      WARNING: ThreadSanitizer: data race (pid=2750142)
        Read of size 4 at 0x7ffc21d7f58c by thread T39 (mutexes: write M670895590377780496):
          #0 rocksdb::SharedState::AllInitialized() const db_stress_tool/db_stress_shared_state.h:204 (db_stress+0x4fd307)
          https://github.com/facebook/rocksdb/issues/1 rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:26 (db_stress+0x4fd307)
          https://github.com/facebook/rocksdb/issues/2 StartThreadWrapper env/env_posix.cc:454 (db_stress+0x84472f)
      
        Previous write of size 4 at 0x7ffc21d7f58c by main thread:
          #0 rocksdb::SharedState::IncThreads() db_stress_tool/db_stress_shared_state.h:194 (db_stress+0x4fd779)
          https://github.com/facebook/rocksdb/issues/1 rocksdb::RunStressTest(rocksdb::StressTest*) db_stress_tool/db_stress_driver.cc:78 (db_stress+0x4fd779)
          https://github.com/facebook/rocksdb/issues/2 rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:348 (db_stress+0x4b97dc)
          https://github.com/facebook/rocksdb/issues/3 main db_stress_tool/db_stress.cc:21 (db_stress+0x47a351)
      
        Location is stack of main thread.
      
        Location is global '<null>' at 0x000000000000 ([stack]+0x00000001d58c)
      
        Mutex M670895590377780496 is already destroyed.
      
        Thread T39 (tid=2750211, running) created by main thread at:
          #0 pthread_create /home/engshare/third-party2/gcc/9.x/src/gcc-10.x/libsanitizer/tsan/tsan_interceptors.cc:964 (libtsan.so.0+0x613c3)
          https://github.com/facebook/rocksdb/issues/1 StartThread env/env_posix.cc:464 (db_stress+0x8463c2)
          https://github.com/facebook/rocksdb/issues/2 rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:288 (db_stress+0x4bcd20)
          https://github.com/facebook/rocksdb/issues/3 rocksdb::EnvWrapper::StartThread(void (*)(void*), void*) include/rocksdb/env.h:1475 (db_stress+0x4bb950)
          https://github.com/facebook/rocksdb/issues/4 rocksdb::RunStressTest(rocksdb::StressTest*) db_stress_tool/db_stress_driver.cc:80 (db_stress+0x4fd9d2)
          https://github.com/facebook/rocksdb/issues/5 rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:348 (db_stress+0x4b97dc)
          https://github.com/facebook/rocksdb/issues/6 main db_stress_tool/db_stress.cc:21 (db_stress+0x47a351)
      
       ThreadSanitizer: data race db_stress_tool/db_stress_shared_state.h:204 in rocksdb::SharedState::AllInitialized() const
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9314
      
      Test Plan: verified repro command works after this PR.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33217698
      
      Pulled By: ajkr
      
      fbshipit-source-id: 79358fe5adb779fc9dcf80643cc102d4b467fc38
      791723c1
    • A
      Skip MemoryAllocatorTest in LITE mode (#9318) · 48b53441
      Andrew Kryczka 提交于
      Summary:
      The tests rely on `CreateFromString()`, which returns
      `Status::NotSupported()` when these tests attempt to create non-default
      allocators.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9318
      
      Reviewed By: riversand963
      
      Differential Revision: D33238405
      
      Pulled By: ajkr
      
      fbshipit-source-id: d2974e2341f1494f5f7cd07b73f2dbd0d502fc7c
      48b53441
  9. 18 12月, 2021 6 次提交
    • A
      Fix unsynced data loss correctness test with mixed `-test_batches_snapshots` (#9302) · 863c78d2
      Andrew Kryczka 提交于
      Summary:
      This fixes two bugs in the recently committed DB verification following
      crash-recovery with unsynced data loss (https://github.com/facebook/rocksdb/issues/8966):
      
      The first bug was in crash test runs involving mixed values for
      `-test_batches_snapshots`. The problem was we were neither restoring
      expected values nor enabling tracing when `-test_batches_snapshots=1`.
      This caused a future `-test_batches_snapshots=0` run to not find enough
      trace data to restore expected values. The fix is to restore expected
      values at the start of `-test_batches_snapshots=1` runs, but still leave
      tracing disabled as we do not need to track those KVs.
      
      The second bug was in `db_stress` runs that restore the expected values
      file and use compaction filter. The compaction filter was initialized to use
      the pre-restore expected values, which would be `munmap()`'d during
      `FileExpectedStateManager::Restore()`. Then compaction filter would run
      into a segfault. The fix is just to reorder compaction filter init after expected
      values restore.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9302
      
      Test Plan:
      - To verify the first problem, the below sequence used to fail; now it passes.
      
      ```
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=0
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=1
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=0
      ```
      
      - The second problem occurred rarely in the form of a SIGSEGV on a file that was `munmap()`d. I have not seen it after this PR though this doesn't prove much.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33155283
      
      Pulled By: ajkr
      
      fbshipit-source-id: 66fd0f0edf34015a010c30015f14f104734e964e
      863c78d2
    • A
      Fix shutdown in db_stress with `-test_batches_snapshots=1` (#9313) · 84228e21
      Andrew Kryczka 提交于
      Summary:
      The `SharedState` constructor had an early return in case of
      `-test_batches_snapshots=1`. This early return caused `num_bg_threads_`
      to never be incremented. Consequently, the driver thread could cleanup
      objects like the `SharedState` while BG threads were still running and
      accessing it, leading to crash.
      
      The fix is to move the logic for counting threads (both FG and BG) to
      the place they are launched. That way we can be sure the counts are
      consistent, at least for now.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9313
      
      Test Plan:
      below command used to fail, now it passes.
      
      ```
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=1
      ```
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33198670
      
      Pulled By: ajkr
      
      fbshipit-source-id: 126592dc1eb31998bc8f82ffbf5a0d4eb8dec317
      84228e21
    • K
      gcc-11 and cmake related cleanup (#9286) · cc1d4e3d
      Kefu Chai 提交于
      Summary:
      in hope to get rockdb compiled with GCC-11 without warning
      
      * util/bloom_test: init a variable before using it
        to silence the GCC warning like
        ```
        util/bloom_test.cc:1253:31: error: ‘<anonymous>’ may be used uninitialized [-Werror=maybe-uninitialized]
         1253 |   Slice key_slice{key_bytes, 8};
              |                               ^
        ...
        include/rocksdb/slice.h:41:3: note: by argument 2 of type ‘const char*’ to ‘rocksdb::Slice::Slice(const char*, size_t)’ declared here
           41 |   Slice(const char* d, size_t n) : data_(d), size_(n) {}
              |   ^~~~~
        util/bloom_test.cc:1249:3: note: ‘<anonymous>’ declared here
         1249 |   };
              |   ^
        cc1plus: all warnings being treated as errors
        ```
      * cmake: add find_package(uring ...)
        find liburing in a more consistent way. also it is the encouraged way for finding a library.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9286
      
      Reviewed By: mrambacher
      
      Differential Revision: D33165241
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9f3487e11b4e40fd8f1c97c8facb24a190e5ce31
      cc1d4e3d
    • A
      Update to version 6.28 (#9312) · 7bfad071
      Akanksha Mahajan 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9312
      
      Reviewed By: ajkr
      
      Differential Revision: D33196324
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 471da75eaedc54d3151672adc28643bc1d6fdf23
      7bfad071
    • P
      Fix unity build with SUPPORT_CLOCK_CACHE (#9309) · 0d9b2568
      Peter Dillinger 提交于
      Summary:
      After https://github.com/facebook/rocksdb/issues/9126
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9309
      
      Test Plan: CI
      
      Reviewed By: ajkr
      
      Differential Revision: D33188902
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 54bf34e33c2b30b1b8dc2a0229e84c194321b606
      0d9b2568
    • Y
      Update TARGETS and related scripts (#9310) · 6b5e28a4
      Yanqin Jin 提交于
      Summary:
      As title. Remove 'unexported_deps_by_default', replace 'deps' and
      'external_deps' with 'exported_deps' and 'exported_external_deps'
      respectively.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9310
      
      Test Plan: Github action and internal jobs.
      
      Reviewed By: DrMarcII
      
      Differential Revision: D33190092
      
      Pulled By: riversand963
      
      fbshipit-source-id: 64200e5331d822f88f8d122a55b7a29bfd1f9553
      6b5e28a4
  10. 17 12月, 2021 5 次提交
    • M
      Make MemoryAllocator into a Customizable class (#8980) · 423538a8
      mrambacher 提交于
      Summary:
      - Make MemoryAllocator and its implementations into a Customizable class.
      - Added a "DefaultMemoryAllocator" which uses new and delete
      - Added a "CountedMemoryAllocator" that counts the number of allocs and free
      - Updated the existing tests to use these new allocators
      - Changed the memkind allocator test into a generic test that can test the various allocators.
      - Added tests for creating all of the allocators
      - Added tests to verify/create the JemallocNodumpAllocator using its options.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8980
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D32990403
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6fdfe8218c10dd8dfef34344a08201be1fa95c76
      423538a8
    • J
      fix java doc issues (#9253) · 9828b6d5
      Jermy Li 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9253
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32990516
      
      Pulled By: mrambacher
      
      fbshipit-source-id: c7cdb6562ac6871bca6ea0d9efa454f3a902a137
      9828b6d5
    • P
      New stable, fixed-length cache keys (#9126) · 0050a73a
      Peter Dillinger 提交于
      Summary:
      This change standardizes on a new 16-byte cache key format for
      block cache (incl compressed and secondary) and persistent cache (but
      not table cache and row cache).
      
      The goal is a really fast cache key with practically ideal stability and
      uniqueness properties without external dependencies (e.g. from FileSystem).
      A fixed key size of 16 bytes should enable future optimizations to the
      concurrent hash table for block cache, which is a heavy CPU user /
      bottleneck, but there appears to be measurable performance improvement
      even with no changes to LRUCache.
      
      This change replaces a lot of disjointed and ugly code handling cache
      keys with calls to a simple, clean new internal API (cache_key.h).
      (Preserving the old cache key logic under an option would be very ugly
      and likely negate the performance gain of the new approach. Complete
      replacement carries some inherent risk, but I think that's acceptable
      with sufficient analysis and testing.)
      
      The scheme for encoding new cache keys is complicated but explained
      in cache_key.cc.
      
      Also: EndianSwapValue is moved to math.h to be next to other bit
      operations. (Explains some new include "math.h".) ReverseBits operation
      added and unit tests added to hash_test for both.
      
      Fixes https://github.com/facebook/rocksdb/issues/7405 (presuming a root cause)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9126
      
      Test Plan:
      ### Basic correctness
      Several tests needed updates to work with the new functionality, mostly
      because we are no longer relying on filesystem for stable cache keys
      so table builders & readers need more context info to agree on cache
      keys. This functionality is so core, a huge number of existing tests
      exercise the cache key functionality.
      
      ### Performance
      Create db with
      `TEST_TMPDIR=/dev/shm ./db_bench -bloom_bits=10 -benchmarks=fillrandom -num=3000000 -partition_index_and_filters`
      And test performance with
      `TEST_TMPDIR=/dev/shm ./db_bench -readonly -use_existing_db -bloom_bits=10 -benchmarks=readrandom -num=3000000 -duration=30 -cache_index_and_filter_blocks -cache_size=250000 -threads=4`
      using DEBUG_LEVEL=0 and simultaneous before & after runs.
      Before ops/sec, avg over 100 runs: 121924
      After ops/sec, avg over 100 runs: 125385 (+2.8%)
      
      ### Collision probability
      I have built a tool, ./cache_bench -stress_cache_key to broadly simulate host-wide cache activity
      over many months, by making some pessimistic simplifying assumptions:
      * Every generated file has a cache entry for every byte offset in the file (contiguous range of cache keys)
      * All of every file is cached for its entire lifetime
      
      We use a simple table with skewed address assignment and replacement on address collision
      to simulate files coming & going, with quite a variance (super-Poisson) in ages. Some output
      with `./cache_bench -stress_cache_key -sck_keep_bits=40`:
      
      ```
      Total cache or DBs size: 32TiB  Writing 925.926 MiB/s or 76.2939TiB/day
      Multiply by 9.22337e+18 to correct for simulation losses (but still assume whole file cached)
      ```
      
      These come from default settings of 2.5M files per day of 32 MB each, and
      `-sck_keep_bits=40` means that to represent a single file, we are only keeping 40 bits of
      the 128-bit cache key.  With file size of 2\*\*25 contiguous keys (pessimistic), our simulation
      is about 2\*\*(128-40-25) or about 9 billion billion times more prone to collision than reality.
      
      More default assumptions, relatively pessimistic:
      * 100 DBs in same process (doesn't matter much)
      * Re-open DB in same process (new session ID related to old session ID) on average
      every 100 files generated
      * Restart process (all new session IDs unrelated to old) 24 times per day
      
      After enough data, we get a result at the end:
      
      ```
      (keep 40 bits)  17 collisions after 2 x 90 days, est 10.5882 days between (9.76592e+19 corrected)
      ```
      
      If we believe the (pessimistic) simulation and the mathematical generalization, we would need to run a billion machines all for 97 billion days to expect a cache key collision. To help verify that our generalization ("corrected") is robust, we can make our simulation more precise with `-sck_keep_bits=41` and `42`, which takes more running time to get enough data:
      
      ```
      (keep 41 bits)  16 collisions after 4 x 90 days, est 22.5 days between (1.03763e+20 corrected)
      (keep 42 bits)  19 collisions after 10 x 90 days, est 47.3684 days between (1.09224e+20 corrected)
      ```
      
      The generalized prediction still holds. With the `-sck_randomize` option, we can see that we are beating "random" cache keys (except offsets still non-randomized) by a modest amount (roughly 20x less collision prone than random), which should make us reasonably comfortable even in "degenerate" cases:
      
      ```
      197 collisions after 1 x 90 days, est 0.456853 days between (4.21372e+18 corrected)
      ```
      
      I've run other tests to validate other conditions behave as expected, never behaving "worse than random" unless we start chopping off structured data.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33171746
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f16a57e369ed37be5e7e33525ace848d0537c88f
      0050a73a
    • A
      Set KeyMayExist fields visibility to public (#9285) · 9918e1ee
      Andrea Cavalli 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/9284
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9285
      
      Reviewed By: pdillinger
      
      Differential Revision: D33062006
      
      Pulled By: mrambacher
      
      fbshipit-source-id: c3471c2db717fa5bc2337cf996ce744af0ed877d
      9918e1ee
    • A
      Verify recovery correctness in multi-CF blackbox crash test (#9303) · 5383f1ee
      Andrew Kryczka 提交于
      Summary:
      db_crashtest.py uses multiple CFs only when run without flag `--simple`.
      The previous config set `-test_batches_snapshots=1` in that case for
      blackbox mode. But `-test_batches_snapshots=1` cannot verify recovery
      correctness, so it should not always be set for multi-CF blackbox tests.
      We can instead randomly toggle it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9303
      
      Reviewed By: riversand963
      
      Differential Revision: D33155229
      
      Pulled By: ajkr
      
      fbshipit-source-id: 4a6fdc4eddccc8ece664063baf6393ce1c5de6b7
      5383f1ee
  11. 16 12月, 2021 4 次提交
    • A
      java / jni io_uring support (#9224) · c1ec0b28
      Alan Paxton 提交于
      Summary:
      Existing multiGet() in java calls multi_get_helper() which then calls DB::std::vector MultiGet(). This doesn't take advantage of io_uring.
      
      This change adds another JNI level method that runs a parallel code path using the DB::void MultiGet(), using ByteBuffers at the JNI level. We call it multiGetDirect(). In addition to using the io_uring path, this code internally returns pinned slices which we can copy out of into our direct byte buffers; this should reduce the overall number of copies in the code path to/from Java. Some jmh benchmark runs (100k keys, 1000 key multiGet) suggest that for value sizes > 1k, we see about a 20% performance improvement, although performance is slightly reduced for small value sizes, there's a little bit more overhead in the JNI methods.
      
      Closes https://github.com/facebook/rocksdb/issues/8407
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9224
      
      Reviewed By: mrambacher
      
      Differential Revision: D32951754
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 1f70df7334be2b6c42a9c8f92725f67c71631690
      c1ec0b28
    • R
      ReadOptions - Add missing java API. (#9248) · 7ac3a5d4
      Radek Hubner 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9248
      
      Reviewed By: mrambacher
      
      Differential Revision: D33011237
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b6544ad40cb722e327bac60a0af711db253e36d7
      7ac3a5d4
    • A
      Update prepopulate_block_cache logic to support block-based filter (#9300) · 96d0773a
      Akanksha Mahajan 提交于
      Summary:
      Update prepopulate_block_cache logic to support block-based
      filter during insertion in block cache
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9300
      
      Test Plan:
      CircleCI tests,
      make crash_test -j64
      
      Reviewed By: pdillinger
      
      Differential Revision: D33132018
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 241deabab8645bda704728e572d6de6354df18b2
      96d0773a
    • A
      db_stress verify with lost unsynced operations (#8966) · c9818b33
      Andrew Kryczka 提交于
      Summary:
      When a previous run left behind historical state/trace files (implying it was run with --sync_fault_injection set), this PR uses them to restore the expected state according to the DB's recovered sequence number. That way, a tail of latest unsynced operations are permitted to be dropped, as is the case when data in page cache or certain `Env`s is lost. The point of the verification in this scenario is just to ensure there is no hole in the recovered data.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8966
      
      Test Plan:
      - ran it a while, made sure it is restoring expected values using the historical state/trace files:
      ```
      $ rm -rf ./tmp-db/ ./exp/ && mkdir -p ./tmp-db/ ./exp/ && while ./db_stress -compression_type=none -clear_column_family_one_in=0 -expected_values_dir=./exp -sync_fault_injection=1 -destroy_db_initially=0 -db=./tmp-db -max_key=1000000 -ops_per_thread=10000 -reopen=0 -threads=32 ; do : ; done
      ```
      
      Reviewed By: pdillinger
      
      Differential Revision: D31219445
      
      Pulled By: ajkr
      
      fbshipit-source-id: f0e1d51fe5b35465b00565c33331190ea38ba0ad
      c9818b33
  12. 15 12月, 2021 2 次提交
    • S
      SimulatedHybridFileSystem to simulate HDD behavior more accurately (#9259) · 806d8916
      sdong 提交于
      Summary:
      SimulatedHybridFileSystem now takes a more thorough simualtion of an HDD:
      1. cover writes too, not just read
      2. Latency and throughput is now simulated as seek + read time, using a rate limiter
      This implementation can be modified to simulate full HDD behavior, which is not yet done.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9259
      
      Test Plan: Run db_bench and observe the desired behavior.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32903039
      
      fbshipit-source-id: a83f5d72143e114d5e75edf39d647bf0b71978e1
      806d8916
    • Y
      Stress test for RocksDB transactions (#8936) · e05c2bb5
      Yanqin Jin 提交于
      Summary:
      Current db_stress does not cover complex read-write transactions. Therefore, this PR adds
      coverage for emulated MyRocks-style transactions in `MultiOpsTxnsStressTest`. To achieve this, we need:
      
      - Add a new operation type 'customops' so that we can add new complex groups of operations, e.g. transactions involving multiple read-write operations.
      - Implement three read-write transactions and two read-only ones to emulate MyRocks-style transactions.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8936
      
      Test Plan:
      ```
      make check
      ./db_stress -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 -readpercent=20 -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=100000
      ```
      
      Next step is to add more configurability and refine input generation and result reporting, which will done in separate follow-up PRs.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31071795
      
      Pulled By: riversand963
      
      fbshipit-source-id: 50d7c828346ec643311336b904848a1588a37006
      e05c2bb5