1. 08 1月, 2022 3 次提交
    • J
      Add filename to several Corruption messages (#9239) · 255aefb6
      jsteemann 提交于
      Summary:
      This change adds the filename of the offending filen to several place that produce Status objects with code `kCorruption`.
      This is not an attempt to have every Corruption message in the codebase extended with the filename, but it is a start.
      The motivation for the change was to quickly diagnose which file is corrupted when a large database is openend and there is not option to copy it offsite for analysis, run strace or install the ldb tool.
      In the particular case in question, the error message improved from a mere
      ```
      Corruption: checksum mismatch
      ```
      to
      ```
      Corruption: checksum mismatch in file /path/to/db/engine-rocksdb/MANIFEST-000171
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9239
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33237742
      
      Pulled By: riversand963
      
      fbshipit-source-id: bd42559cfbf786a0a674d091671d1a2bf07bdd31
      255aefb6
    • Y
      Remove obsolete function declaration (#8724) · 3dfee770
      Youngjae Lee 提交于
      Summary:
      Function `Version::UpdateFilesByCompactionPri()` is never called and not implemented.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8724
      
      Reviewed By: ajkr
      
      Differential Revision: D30643943
      
      Pulled By: riversand963
      
      fbshipit-source-id: 174b2d9a2a42e286222909a035cc74a7b5602335
      3dfee770
    • H
      Release cache reservation of hash entries of the fall-back Ribbon Filter earlier (#9345) · 9110685e
      Hui Xiao 提交于
      Summary:
      Note: rebase on and merge after https://github.com/facebook/rocksdb/pull/9349, as part of https://github.com/facebook/rocksdb/pull/9342
      **Context:**
      https://github.com/facebook/rocksdb/pull/9073 charged the hash entries' memory in block cache with `CacheReservationHandle`. However, in the edge case where Ribbon Filter falls back to Bloom Filter and swaps its hash entries to the embedded bloom filter object, the handles associated with those entries are not swapped and thus not released as soon as those entries are cleared during Bloom Filter's finish process.
      
      Although this is a minor issue since RocksDB internal calls `FilterBitsBuilder->Reset()` right after `FilterBitsBuilder->Finish()` on the main path, which releases all the cache reservation related to both the Ribbon Filter and its embedded Bloom Filter, it still worths this fix to avoid confusion.
      
      **Summary:**
      - Swapped the `CacheReservationHandle` associated with the hash entries on Ribbon Filter's fallback
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9345
      
      Test Plan: - Added a unit test to verify the number of cache reservation after clearing hash entries, which failed before the change and now succeeds
      
      Reviewed By: pdillinger
      
      Differential Revision: D33377225
      
      Pulled By: hx235
      
      fbshipit-source-id: 7487f4c40dfb6ee7928232021f93ef2c5329cffa
      9110685e
  2. 07 1月, 2022 2 次提交
  3. 06 1月, 2022 3 次提交
    • Y
      Add checking for `DB::DestroyColumnFamilyHandle()` (#9347) · b2e53ab2
      Yanqin Jin 提交于
      Summary:
      Closing https://github.com/facebook/rocksdb/issues/5006
      
      Calling `DB::DestroyColumnFamilyHandle(column_family)` with `column_family` being the return value of
      `DB::DefaultColumnFamily()` will return `Status::InvalidArgument()`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9347
      
      Test Plan: make check
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D33369675
      
      Pulled By: riversand963
      
      fbshipit-source-id: a8266a4daddf2b7a773c2dc7f3eb9a4adfb6b6dd
      b2e53ab2
    • A
      Test correctness with WAL disabled in non-txn blackbox crash tests (#9338) · 6892f19b
      Andrew Kryczka 提交于
      Summary:
      Recently we added the ability to verify some prefix of operations are recovered (AKA no "hole" in the recovered data) (https://github.com/facebook/rocksdb/issues/8966). Besides testing unsynced data loss scenarios, it is also useful to test WAL disabled use cases, where unflushed writes are expected to be lost. Note RocksDB only offers the prefix-recovery guarantee to WAL-disabled use cases that use atomic flush, so crash test always enables atomic flush when WAL is disabled.
      
      To verify WAL-disabled crash-recovery correctness globally, i.e., also in whitebox and blackbox transaction tests, it is possible but requires further changes. I added TODOs in db_crashtest.py.
      
      Depends on https://github.com/facebook/rocksdb/issues/9305.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9338
      
      Test Plan: Running all crash tests and many instances of blackbox. Sandcastle links are in Phabricator diff test plan.
      
      Reviewed By: riversand963
      
      Differential Revision: D33345333
      
      Pulled By: ajkr
      
      fbshipit-source-id: f56dd7d2e5a78d59301bf4fc3fedb980eb31e0ce
      6892f19b
    • A
      Recover to exact latest seqno of data committed to MANIFEST (#9305) · b860a421
      Andrew Kryczka 提交于
      Summary:
      The LastSequence field in the MANIFEST file is the baseline seqno for a recovered DB. Recovering WAL entries might cause the recovered DB's seqno to advance above this baseline, but the recovered DB will never use a smaller seqno.
      
      Before this PR, we were writing the DB's seqno at the time of LogAndApply() as the LastSequence value. This works in the sense that it is a large enough baseline for the recovered DB that it'll never overwrite any records in existing SST files. At the same time, it's arbitrarily larger than what's needed. This behavior comes from LevelDB, where there was no tracking of largest seqno in an SST file.
      
      Now we know the largest seqno of newly written SST files, so we can write an exact value in LastSequence that actually reflects the largest seqno in any file referred to by the MANIFEST. This is primarily useful for correctness testing with unsynced data loss, where the recovered DB's seqno needs to indicate what records were recovered.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9305
      
      Test Plan:
      - https://github.com/facebook/rocksdb/issues/9338 adds crash-recovery correctness testing coverage for WAL disabled use cases
      - https://github.com/facebook/rocksdb/issues/9357 will extend that testing to cover file ingestion
      - Added assertion at end of LogAndApply() for `VersionSet::descriptor_last_sequence_` consistency with files
      - Manually tested upgrade/downgrade compatibility with a custom crash test that randomly picks between a `db_stress` built with and without this PR (for old code it must run with `-disable_wal=0`)
      
      Reviewed By: riversand963
      
      Differential Revision: D33182770
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0bfafaf685f347cc8cb0e1d62e0186340a738f7d
      b860a421
  4. 05 1月, 2022 1 次提交
  5. 31 12月, 2021 1 次提交
    • Y
      Fix a bug in C-binding causing iterator to return incorrect result (#9343) · 677d2b4a
      Yanqin Jin 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/9339
      
      When writing SST file, the name, computed as `prefix_extractor->GetId()` will be written to the properties block.
      When the SST is opened again in the future, `CreateFromString()` will take the name as argument and try
      to create a prefix extractor object. Without this fix, the C API will pass a `Wrapper` pointer to the underlying
      DB's `prefix_extractor`. `Wrapper::GetId()`, in this case, will be missing the prefix length component, causing a
      prefix extractor of length 0 to be silently created and used.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9343
      
      Test Plan:
      ```
      make c_test
      ./c_test
      ```
      
      Reviewed By: mrambacher
      
      Differential Revision: D33355549
      
      Pulled By: riversand963
      
      fbshipit-source-id: c92c3acd8be262c3bff8794b4229e42b9ee31203
      677d2b4a
  6. 30 12月, 2021 1 次提交
    • S
      Improve SimulatedHybridFileSystem (#9301) · a931bacf
      sdong 提交于
      Summary:
      Several improvements to SimulatedHybridFileSystem:
      (1) Allow a mode where all I/Os to all files simulate HDD. This can be enabled in db_bench using -simulate_hdd
      (2) Latency calculation is slightly more accurate
      (3) Allow to simulate more than one HDD spindles.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9301
      
      Test Plan: Run db_bench and observe the results are reasonable.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33141662
      
      fbshipit-source-id: b736e58c4ba910d06899cc9ccec79b628275f4fa
      a931bacf
  7. 29 12月, 2021 5 次提交
    • M
      Remove/Reduce use of Regex in ObjectRegistry/Library (#9264) · 1c39b795
      mrambacher 提交于
      Summary:
      Added new ObjectLibrary::Entry classes to replace/reduce the use of Regex.  For simple factories that only do name matching, there are "StringEntry" and "AltStringEntry" classes.  For classes that use some semblance of regular expressions, there is a PatternEntry class that can match a name and prefixes.  There is also a class for Customizable::IndividualId format matches.
      
      Added tests for the new derivative classes and got all unit tests to pass.
      
      Resolves https://github.com/facebook/rocksdb/issues/9225.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9264
      
      Reviewed By: pdillinger
      
      Differential Revision: D33062001
      
      Pulled By: mrambacher
      
      fbshipit-source-id: c2d2143bd2d38bdf522705c8280c35381b135c03
      1c39b795
    • M
      Change GTEST_SKIP to BYPASS for MemoryAllocatorTest (#9340) · 0a563ae2
      mrambacher 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9340
      
      Reviewed By: riversand963
      
      Differential Revision: D33344152
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 283637625b86c33497571c5f52cac3ddf910b6f3
      0a563ae2
    • P
      New blog post for Ribbon filter (#8992) · 26a238f5
      Peter Dillinger 提交于
      Summary:
      new blog post for Ribbon filter
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8992
      
      Test Plan: markdown render in GitHub, Pages on my fork
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33342496
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a0a7c19100abdf8755f8a618eb4dead755dfddae
      26a238f5
    • A
      Added `TraceOptions::preserve_write_order` (#9334) · aa2b3bf6
      Andrew Kryczka 提交于
      Summary:
      This option causes trace records to be written in the serialized write thread. That way, the write records in the trace must follow the same order as writes that are logged to WAL and writes that are applied to the DB.
      
      By default I left it disabled to match existing behavior. I enabled it in `db_stress`, though, as that use case requires order of write records in trace matches the order in WAL.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9334
      
      Test Plan:
      - See if below unsynced data loss crash test can run  for 24h straight. It used to crash after a few hours when reaching an unlucky trace ordering.
      
      ```
      DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm /usr/local/bin/python3 -u tools/db_crashtest.py blackbox --interval=10 --max_key=100000 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152 --value_size_mult=33 --sync_fault_injection=1 --test_batches_snapshots=0 --duration=86400
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33301990
      
      Pulled By: ajkr
      
      fbshipit-source-id: 82d97559727adb4462a7af69758449c8725b22d3
      aa2b3bf6
    • A
      Extend trace filtering to more operation types (#9335) · 2ee20a66
      Andrew Kryczka 提交于
      Summary:
      - Extended trace filtering to cover `MultiGet()`, `Seek()`, and `SeekForPrev()`. Now all user ops that can be traced support filtering.
      - Enabled the new filter masks in `db_stress` since it only cares to trace writes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9335
      
      Test Plan:
      - trace-heavy `db_stress` command reduced 30% elapsed time  (79.21 -> 55.47 seconds)
      
      Benchmark command:
      ```
      $ /usr/bin/time ./db_stress -ops_per_thread=100000 -sync_fault_injection=1 --db=/dev/shm/rocksdb_stress_db/ --expected_values_dir=/dev/shm/rocksdb_stress_expected/ --clear_column_family_one_in=0
      ```
      
      - replay-heavy `db_stress` command reduced 12.4% elapsed time (23.69 -> 20.75 seconds)
      
      Setup command:
      ```
      $  ./db_stress -ops_per_thread=100000000 -sync_fault_injection=1 -db=/dev/shm/rocksdb_stress_db/ -expected_values_dir=/dev/shm/rocksdb_stress_expected --clear_column_family_one_in=0 & sleep 120; pkill -9 db_stress
      ```
      
      Benchmark command:
      ```
      $ /usr/bin/time ./db_stress -ops_per_thread=1 -reopen=0 -expected_values_dir=/dev/shm/rocksdb_stress_expected/ -db=/dev/shm/rocksdb_stress_db/ --clear_column_family_one_in=0 --destroy_db_initially=0
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33304580
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0df10f87c1fc506e9484b6b42cea2ef96c7ecd65
      2ee20a66
  8. 24 12月, 2021 1 次提交
    • S
      Make IncreaseFullHistoryTsLow to a public API (#9221) · 2e5f7642
      slk 提交于
      Summary:
      As (https://github.com/facebook/rocksdb/issues/9210) discussed, the **full_history_ts_low** is a member of CompactRangeOptions currently, which means a CF's fullHistoryTsLow is advanced only when users submit a CompactRange request.
      However, users may want to advance the fllHistoryTsLow without an immediate compact.
      This merge make IncreaseFullHistoryTsLow to a public API so users can advance each CF's fullHistoryTsLow seperately.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9221
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D33201106
      
      Pulled By: riversand963
      
      fbshipit-source-id: 9cb1d013ba93260f72e16353e693ffee167b47ee
      2e5f7642
  9. 23 12月, 2021 5 次提交
    • A
      Fix race condition in BackupEngineTest.ChangeManifestDuringBackupCreation (#9327) · 538d2365
      Andrew Kryczka 提交于
      Summary:
      The failure looked like this:
      
      ```
      utilities/backupable/backupable_db_test.cc:3161: Failure
      Value of: db_chroot_env_->FileExists(prev_manifest_path).IsNotFound()
        Actual: false
      Expected: true
      ```
      
      The failure could be coerced consistently with the following patch:
      
      ```
       diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc
      index 80410f671..637636791 100644
       --- a/db/db_impl/db_impl_compaction_flush.cc
      +++ b/db/db_impl/db_impl_compaction_flush.cc
      @@ -2772,6 +2772,8 @@ void DBImpl::BackgroundCallFlush(Env::Priority thread_pri) {
           if (job_context.HaveSomethingToClean() ||
               job_context.HaveSomethingToDelete() || !log_buffer.IsEmpty()) {
             mutex_.Unlock();
      +      bg_cv_.SignalAll();
      +      sleep(1);
             TEST_SYNC_POINT("DBImpl::BackgroundCallFlush:FilesFound");
             // Have to flush the info logs before bg_flush_scheduled_--
             // because if bg_flush_scheduled_ becomes 0 and the lock is
      ```
      
      The cause was a familiar problem, which is manual flush/compaction may
      return before files they obsoleted are removed. The solution is just to
      wait for "scheduled" work to complete, which includes all phases
      including cleanup.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9327
      
      Test Plan:
      after this PR, even the above patch to coerce the bug cannot
      cause the test to fail.
      
      Reviewed By: riversand963
      
      Differential Revision: D33252208
      
      Pulled By: ajkr
      
      fbshipit-source-id: 720a7eaca58c7247d221911fffe3d5e1dbf581e9
      538d2365
    • S
      Expose locktree's wait count in RangeLockManagerHandle::Counters (#9289) · 1b076e82
      Sergei Petrunia 提交于
      Summary:
      locktree is a module providing Range Locking. It has a counter for
      the number of times a lock acquisition request was blocked by an
      existing conflicting lock and had to wait for it to be released.
      
      Expose this counter in RangeLockManagerHandle::Counters::lock_wait_count.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9289
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33079182
      
      Pulled By: riversand963
      
      fbshipit-source-id: 25b1a362d9da247536ab5007bd15900b319f139e
      1b076e82
    • A
      Filter `Get()`s from `db_stress` traces (#9315) · dfff1cec
      Andrew Kryczka 提交于
      Summary:
      `db_stress` traces are used for tracking unsynced changes. For that purpose, we
      only need to track writes and not reads. Currently `TraceOptions` only
      supports excluding `Get()`s from the trace, so this PR only excludes
      `Get()`s. In the future it would be good to exclude `MultiGet()`s and
      iterator operations too.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9315
      
      Test Plan:
      - trace-heavy `db_stress` command elapsed time reduced 37%
      
      Benchmark:
      ```
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_stress -ops_per_thread=100000 -sync_fault_injection=1 -expected_values_dir=/dev/shm/dbstress_expected --clear_column_family_one_in=0
      ```
      
      - replay-heavy `db_stress` command elapsed time reduced 38%
      
      Setup:
      ```
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_stress -ops_per_thread=100000000 -sync_fault_injection=1 -expected_values_dir=/dev/shm/dbstress_expected --clear_column_family_one_in=0 & sleep 120; pkill -9 db_stress
      ```
      Benchmark:
      ```
      TEST_TMPDIR=/dev/shm /usr/bin/time ./db_stress -ops_per_thread=1 -reopen=0 -expected_values_dir=/dev/shm/dbstress_expected --clear_column_family_one_in=0 --destroy_db_initially=0
      ```
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33229900
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0e4251c674d236ddbc4548e9bbfdd608bf3cdc93
      dfff1cec
    • A
      Fixes for building RocksJava builds on s390x (#9321) · 65996dd7
      Adam Retter 提交于
      Summary:
      * Added Docker build environment for RocksJava on s390x
      * Cache alignment size for s390x was incorrectly calculated on gcc 6.4.0
      * Tighter control over which installed version of Java is used is required - build now correctly adheres to `JAVA_HOME` if it is set
      * Alpine build scripts should be used on Alpine (previously CentOS script worked by falling through to minimal gcc version)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9321
      
      Reviewed By: mrambacher
      
      Differential Revision: D33259624
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: d791a5150581344925c3c3f9cbb9a3622d63b3b6
      65996dd7
    • A
      Enable core dumps in ASAN crash tests (#9330) · 2d3c626b
      Andrew Kryczka 提交于
      Summary:
      There are some crashes we couldn't debug or repro and couldn't find a core dump. For ASAN the default is `disable_coredump=1` as the doc mentions core dumps can be 16TB+. However I've tried generating them for our `db_stress` commands and they've been in the 1.4-1.6GB range, which is fine. So we can try enabling it in CI.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9330
      
      Test Plan:
      - create a test job. (It's internal infra so I put the link in the Phabricator test plan only)
      - ran the same command locally, `kill -6 $(pidof db_stress)`, verified core dump showed up
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33271841
      
      Pulled By: ajkr
      
      fbshipit-source-id: 93b853fa763d5708d078771960ba36854c4be55a
      2d3c626b
  10. 22 12月, 2021 3 次提交
    • A
      Fix a bug that occur when plugin pkg-config requirements are empty (#9238) · 2e51b33d
      Andreas Hindborg 提交于
      Summary:
      Fix a bug introduced by https://github.com/facebook/rocksdb/issues/9198. The bug is triggered when a plugin does not provide any pkg-config requirements.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9238
      
      Reviewed By: riversand963
      
      Differential Revision: D32771406
      
      Pulled By: ajkr
      
      fbshipit-source-id: 79301871a8bf4e624d5e5eb9d219d7f13948c64d
      2e51b33d
    • A
      More asserts in listener_test for debuggability (#9320) · 393fc231
      Andrew Kryczka 提交于
      Summary:
      We ran into a flake I could not debug so instead added assertions in
      case it happens again.
      
      Command was:
      
      ```
      TEST_TMPDIR=/dev/shm/rocksdb COMPILE_WITH_UBSAN=1 USE_CLANG=1 OPT=-g SKIP_FORMAT_BUCK_CHECKS=1 make J=80 -j80 ubsan_check
      ```
      
      Failure output was:
      
      ```
      [==========] Running 1 test from 1 test case.
      [----------] Global test environment set-up.
      [----------] 1 test from EventListenerTest
      [ RUN      ] EventListenerTest.DisableBGCompaction
      UndefinedBehaviorSanitizer:DEADLYSIGNAL
      ==1558126==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000031 (pc 0x7fd9c04dda22 bp 0x7fd9bf8aa580 sp 0x7fd9bf8aa540 T1558147)
      ==1558126==The signal is caused by a READ memory access.
      ==1558126==Hint: address points to the zero page.
          #0 0x7fd9c04dda21 in __dynamic_cast /home/engshare/third-party2/libgcc/9.x/src/gcc-9.x/x86_64-facebook-linux/libstdc++-v3/libsupc++/../../.././libstdc++-v3/libsupc++/dyncast.cc:49:3
          https://github.com/facebook/rocksdb/issues/1 0x510c53 in __ubsan::checkDynamicType(void*, void*, unsigned long) (/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/listener_test+0x510c53)
          https://github.com/facebook/rocksdb/issues/2 0x50fb32 in HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) (/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/listener_test+0x50fb32)
          https://github.com/facebook/rocksdb/issues/3 0x510230 in __ubsan_handle_dynamic_type_cache_miss_abort (/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/listener_test+0x510230)
          https://github.com/facebook/rocksdb/issues/4 0x63221a in rocksdb::ColumnFamilyHandleImpl* rocksdb::static_cast_with_check<rocksdb::ColumnFamilyHandleImpl, rocksdb::ColumnFamilyHandle>(rocksdb::ColumnFamilyHandle*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/./util/cast_util.h:19:20
          https://github.com/facebook/rocksdb/issues/5 0x71cafa in rocksdb::DBImpl::TEST_GetFilesMetaData(rocksdb::ColumnFamilyHandle*, std::vector<std::vector<rocksdb::FileMetaData, std::allocator<rocksdb::FileMetaData> >, std::allocator<std::vector<rocksdb::FileMetaData, std::allocator<rocksdb::FileMetaData> > > >*, std::vector<std::shared_ptr<rocksdb::BlobFileMetaData>, std::allocator<std::shared_ptr<rocksdb::BlobFileMetaData> > >*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_debug.cc:63:14
          https://github.com/facebook/rocksdb/issues/6 0x53f6b4 in rocksdb::TestFlushListener::OnFlushCompleted(rocksdb::DB*, rocksdb::FlushJobInfo const&) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/listener_test.cc:277:24
          https://github.com/facebook/rocksdb/issues/7 0x6e2f7d in rocksdb::DBImpl::NotifyOnFlushCompleted(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, std::__cxx11::list<std::unique_ptr<rocksdb::FlushJobInfo, std::default_delete<rocksdb::FlushJobInfo> >, std::allocator<std::unique_ptr<rocksdb::FlushJobInfo, std::default_delete<rocksdb::FlushJobInfo> > > >*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:863:19
          https://github.com/facebook/rocksdb/issues/8 0x6e1074 in rocksdb::DBImpl::FlushMemTableToOutputFile(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, bool*, rocksdb::JobContext*, rocksdb::SuperVersionContext*, std::vector<unsigned long, std::allocator<unsigned long> >&, unsigned long, rocksdb::SnapshotChecker*, rocksdb::LogBuffer*, rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:314:5
          https://github.com/facebook/rocksdb/issues/9 0x6e3412 in rocksdb::DBImpl::FlushMemTablesToOutputFiles(rocksdb::autovector<rocksdb::DBImpl::BGFlushArg, 8ul> const&, bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:359:14
          https://github.com/facebook/rocksdb/issues/10 0x700df6 in rocksdb::DBImpl::BackgroundFlush(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::FlushReason*, rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2703:14
          https://github.com/facebook/rocksdb/issues/11 0x6fe1f0 in rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2742:16
          https://github.com/facebook/rocksdb/issues/12 0x6fc732 in rocksdb::DBImpl::BGWorkFlush(void*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2569:44
          https://github.com/facebook/rocksdb/issues/13 0xb3a820 in void std::_Bind<void (* (void*))(void*)>::operator()<void>() /mnt/gvfs/third-party2/libgcc/4959b39cfbe5965a37c861c4c327fa7c5c759b87/9.x/platform009/9202ce7/include/c++/9.x/functional:482:17
          https://github.com/facebook/rocksdb/issues/14 0xb3a820 in std::_Function_handler<void (), std::_Bind<void (* (void*))(void*)> >::_M_invoke(std::_Any_data const&) /mnt/gvfs/third-party2/libgcc/4959b39cfbe5965a37c861c4c327fa7c5c759b87/9.x/platform009/9202ce7/include/c++/9.x/bits/std_function.h:300:2
          https://github.com/facebook/rocksdb/issues/15 0xb347cc in rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/util/threadpool_imp.cc:266:5
          https://github.com/facebook/rocksdb/issues/16 0xb34a2f in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) /data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/util/threadpool_imp.cc:307:7
          https://github.com/facebook/rocksdb/issues/17 0x7fd9c051a660 in execute_native_thread_routine /home/engshare/third-party2/libgcc/9.x/src/gcc-9.x/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:80:18
          https://github.com/facebook/rocksdb/issues/18 0x7fd9c041e20b in start_thread /home/engshare/third-party2/glibc/2.30/src/glibc-2.30/nptl/pthread_create.c:479:8
          https://github.com/facebook/rocksdb/issues/19 0x7fd9c01dd16e in clone /home/engshare/third-party2/glibc/2.30/src/glibc-2.30/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9320
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33242185
      
      Pulled By: ajkr
      
      fbshipit-source-id: 741984b10a610e0509e0d4e54c42cdbac03f5285
      393fc231
    • M
      Add NewMetaDataIterator method (#8692) · 9a116ab4
      mrambacher 提交于
      Summary:
      Fixes a problem where the iterator for metadata was being treated as a non-user key when in fact it was a user key.  This led to a problem where the property keys could not be searched for correctly.
      
      The main exposure of this problem was that the HashIndexReader could not get the "prefixes" property correctly, resulting in the failure of retrieval/creation of the BlockPrefixIndex.
      
      Added BlockBasedTableTest.SeekMetaBlocks test to validate this condition.
      
      Fixing this condition exposed two other tests (SeekWithPrefixLongerThanKey, MultiGetPrefixFilter) that passed incorrectly previously and now failed.  Updated those two tests to pass.  Not sure if the tests are functionally correct/still appropriate, but made them pass...
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8692
      
      Reviewed By: riversand963
      
      Differential Revision: D33119539
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 658969fe9265f73dc184dab97cc3f4eaed2d881a
      9a116ab4
  11. 21 12月, 2021 6 次提交
    • S
      Minor Javadoc fixes (#9203) · 7ae213f7
      stefan-zobel 提交于
      Summary:
      Added two missing parameter tags with description and added some descriptions for parameter / return tags
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9203
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32990607
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 10aea4c4cf1c28d5e97d19722ee835a965d1eb55
      7ae213f7
    • A
      db_stress print hex key for MultiGet() inconsistency (#9324) · 82670fb1
      Andrew Kryczka 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9324
      
      Reviewed By: riversand963
      
      Differential Revision: D33248178
      
      Pulled By: ajkr
      
      fbshipit-source-id: c8a7382ed613f9ac3a0a2e3fa7d3c6fe9c95ef85
      82670fb1
    • A
      Fix race condition in `error_handler_fs_test` (#9325) · 782fcc44
      Andrew Kryczka 提交于
      Summary:
      We saw the below assertion failure in `error_handler_fs_test`:
      
      ```
      db/error_handler_fs_test.cc:2471: Failure
      Expected equality of these values:
        listener->new_bg_error()
          Which is: 16-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>
        Status::Aborted()
          Which is: 16-byte object <0A-00 00-00 60-61 00-00 00-00 00-00 00-00 00-00>
      terminate called after throwing an instance of 'testing::internal::GoogleTestFailureException'
        what():  db/error_handler_fs_test.cc:2471: Failure
      Expected equality of these values:
        listener->new_bg_error()
          Which is: 16-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>
        Status::Aborted()
          Which is: 16-byte object <0A-00 00-00 60-61 00-00 00-00 00-00 00-00 00-00>
      Received signal 6 (Aborted)
      ```
      
      The problem was completing `OnErrorRecoveryCompleted()` would
      wake up the main thread and allow it to proceed to that assertion. But
      that assertion assumes `OnErrorRecoveryEnd()` has completed since
      only `OnErrorRecoveryEnd()` affects `new_bg_error()`.
      
      The fix is just to make `OnErrorRecoveryCompleted()` not wake up the
      main thread, by means of not implementing it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9325
      
      Test Plan:
      - ran `while TEST_TMPDIR=/dev/shm ./error_handler_fs_test ; do : ; done` for a while
      - injected sleep between `OnErrorRecovery{Completed,End}()` callbacks, which guaranteed repro before this PR
      
      Reviewed By: anand1976
      
      Differential Revision: D33249200
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1659ee183cd09f90d4dbd898f65103473fcf84a8
      782fcc44
    • A
      `db_stress` tolerate incomplete tail records in trace file (#9316) · b448b712
      Andrew Kryczka 提交于
      Summary:
      I saw the following error when running crash test for a while with
      unsynced data loss:
      
      ```
      Error restoring historical expected values: Corruption: Corrupted trace file.
      ```
      
      The trace file turned out to have an incomplete tail record. This is
      normal considering blackbox kills `db_stress` while trace can be
      ongoing.
      
      In the case where the trace file is not otherwise corrupted, there
      should be enough records already seen to sync up the expected state with
      the recovered DB. This PR ignores any `Status::Corruption` the
      `Replayer` returns when that happens.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9316
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33230579
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9814af4e39e57f00d85be7404363211762f9b41b
      b448b712
    • A
      Fix race condition in db_stress thread setup (#9314) · 791723c1
      Andrew Kryczka 提交于
      Summary:
      We need to grab `SharedState`'s mutex while calling `IncThreads()` or `IncBgThreads()`. Otherwise the newly launched threads can simultaneously access the thread counters to check if every thread has finished initializing.
      
      Repro command:
      
      ```
      $ rm -rf /dev/shm/rocksdb/rocksdb_crashtest_{whitebox,expected}/ && mkdir -p /dev/shm/rocksdb/rocksdb_crashtest_{whitebox,expected}/ && ./db_stress --acquire_snapshot_one_in=10000 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=131.8094496796033 --bottommost_compression_type=zlib --cache_index_and_filter_blocks=1 --cache_size=1048576 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_style=1 --compaction_ttl=0 --compression_max_dict_buffer_bytes=134217727 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox --db_write_buffer_size=8388608 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --fail_if_options_file_error=1 --file_checksum_impl=crc32c --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=15 --index_type=3 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --log2_keys_per_lock=22 --long_running_snapshots=0 --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=4194304 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=20000 --optimize_filters_for_memory=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefixpercent=5 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=45 --recycle_log_file_num=1 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=1 --test_cf_consistency=1 --top_level_index_pinning=0 --unpartitioned_pinning=0 --use_block_based_filter=1 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=0 --use_multiget=1 --user_timestamp_size=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=1048576 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35
      ```
      
      TSAN error:
      
      ```
      WARNING: ThreadSanitizer: data race (pid=2750142)
        Read of size 4 at 0x7ffc21d7f58c by thread T39 (mutexes: write M670895590377780496):
          #0 rocksdb::SharedState::AllInitialized() const db_stress_tool/db_stress_shared_state.h:204 (db_stress+0x4fd307)
          https://github.com/facebook/rocksdb/issues/1 rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:26 (db_stress+0x4fd307)
          https://github.com/facebook/rocksdb/issues/2 StartThreadWrapper env/env_posix.cc:454 (db_stress+0x84472f)
      
        Previous write of size 4 at 0x7ffc21d7f58c by main thread:
          #0 rocksdb::SharedState::IncThreads() db_stress_tool/db_stress_shared_state.h:194 (db_stress+0x4fd779)
          https://github.com/facebook/rocksdb/issues/1 rocksdb::RunStressTest(rocksdb::StressTest*) db_stress_tool/db_stress_driver.cc:78 (db_stress+0x4fd779)
          https://github.com/facebook/rocksdb/issues/2 rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:348 (db_stress+0x4b97dc)
          https://github.com/facebook/rocksdb/issues/3 main db_stress_tool/db_stress.cc:21 (db_stress+0x47a351)
      
        Location is stack of main thread.
      
        Location is global '<null>' at 0x000000000000 ([stack]+0x00000001d58c)
      
        Mutex M670895590377780496 is already destroyed.
      
        Thread T39 (tid=2750211, running) created by main thread at:
          #0 pthread_create /home/engshare/third-party2/gcc/9.x/src/gcc-10.x/libsanitizer/tsan/tsan_interceptors.cc:964 (libtsan.so.0+0x613c3)
          https://github.com/facebook/rocksdb/issues/1 StartThread env/env_posix.cc:464 (db_stress+0x8463c2)
          https://github.com/facebook/rocksdb/issues/2 rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:288 (db_stress+0x4bcd20)
          https://github.com/facebook/rocksdb/issues/3 rocksdb::EnvWrapper::StartThread(void (*)(void*), void*) include/rocksdb/env.h:1475 (db_stress+0x4bb950)
          https://github.com/facebook/rocksdb/issues/4 rocksdb::RunStressTest(rocksdb::StressTest*) db_stress_tool/db_stress_driver.cc:80 (db_stress+0x4fd9d2)
          https://github.com/facebook/rocksdb/issues/5 rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:348 (db_stress+0x4b97dc)
          https://github.com/facebook/rocksdb/issues/6 main db_stress_tool/db_stress.cc:21 (db_stress+0x47a351)
      
       ThreadSanitizer: data race db_stress_tool/db_stress_shared_state.h:204 in rocksdb::SharedState::AllInitialized() const
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9314
      
      Test Plan: verified repro command works after this PR.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33217698
      
      Pulled By: ajkr
      
      fbshipit-source-id: 79358fe5adb779fc9dcf80643cc102d4b467fc38
      791723c1
    • A
      Skip MemoryAllocatorTest in LITE mode (#9318) · 48b53441
      Andrew Kryczka 提交于
      Summary:
      The tests rely on `CreateFromString()`, which returns
      `Status::NotSupported()` when these tests attempt to create non-default
      allocators.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9318
      
      Reviewed By: riversand963
      
      Differential Revision: D33238405
      
      Pulled By: ajkr
      
      fbshipit-source-id: d2974e2341f1494f5f7cd07b73f2dbd0d502fc7c
      48b53441
  12. 18 12月, 2021 6 次提交
    • A
      Fix unsynced data loss correctness test with mixed `-test_batches_snapshots` (#9302) · 863c78d2
      Andrew Kryczka 提交于
      Summary:
      This fixes two bugs in the recently committed DB verification following
      crash-recovery with unsynced data loss (https://github.com/facebook/rocksdb/issues/8966):
      
      The first bug was in crash test runs involving mixed values for
      `-test_batches_snapshots`. The problem was we were neither restoring
      expected values nor enabling tracing when `-test_batches_snapshots=1`.
      This caused a future `-test_batches_snapshots=0` run to not find enough
      trace data to restore expected values. The fix is to restore expected
      values at the start of `-test_batches_snapshots=1` runs, but still leave
      tracing disabled as we do not need to track those KVs.
      
      The second bug was in `db_stress` runs that restore the expected values
      file and use compaction filter. The compaction filter was initialized to use
      the pre-restore expected values, which would be `munmap()`'d during
      `FileExpectedStateManager::Restore()`. Then compaction filter would run
      into a segfault. The fix is just to reorder compaction filter init after expected
      values restore.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9302
      
      Test Plan:
      - To verify the first problem, the below sequence used to fail; now it passes.
      
      ```
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=0
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=1
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=0
      ```
      
      - The second problem occurred rarely in the form of a SIGSEGV on a file that was `munmap()`d. I have not seen it after this PR though this doesn't prove much.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33155283
      
      Pulled By: ajkr
      
      fbshipit-source-id: 66fd0f0edf34015a010c30015f14f104734e964e
      863c78d2
    • A
      Fix shutdown in db_stress with `-test_batches_snapshots=1` (#9313) · 84228e21
      Andrew Kryczka 提交于
      Summary:
      The `SharedState` constructor had an early return in case of
      `-test_batches_snapshots=1`. This early return caused `num_bg_threads_`
      to never be incremented. Consequently, the driver thread could cleanup
      objects like the `SharedState` while BG threads were still running and
      accessing it, leading to crash.
      
      The fix is to move the logic for counting threads (both FG and BG) to
      the place they are launched. That way we can be sure the counts are
      consistent, at least for now.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9313
      
      Test Plan:
      below command used to fail, now it passes.
      
      ```
      $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=1
      ```
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D33198670
      
      Pulled By: ajkr
      
      fbshipit-source-id: 126592dc1eb31998bc8f82ffbf5a0d4eb8dec317
      84228e21
    • K
      gcc-11 and cmake related cleanup (#9286) · cc1d4e3d
      Kefu Chai 提交于
      Summary:
      in hope to get rockdb compiled with GCC-11 without warning
      
      * util/bloom_test: init a variable before using it
        to silence the GCC warning like
        ```
        util/bloom_test.cc:1253:31: error: ‘<anonymous>’ may be used uninitialized [-Werror=maybe-uninitialized]
         1253 |   Slice key_slice{key_bytes, 8};
              |                               ^
        ...
        include/rocksdb/slice.h:41:3: note: by argument 2 of type ‘const char*’ to ‘rocksdb::Slice::Slice(const char*, size_t)’ declared here
           41 |   Slice(const char* d, size_t n) : data_(d), size_(n) {}
              |   ^~~~~
        util/bloom_test.cc:1249:3: note: ‘<anonymous>’ declared here
         1249 |   };
              |   ^
        cc1plus: all warnings being treated as errors
        ```
      * cmake: add find_package(uring ...)
        find liburing in a more consistent way. also it is the encouraged way for finding a library.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9286
      
      Reviewed By: mrambacher
      
      Differential Revision: D33165241
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9f3487e11b4e40fd8f1c97c8facb24a190e5ce31
      cc1d4e3d
    • A
      Update to version 6.28 (#9312) · 7bfad071
      Akanksha Mahajan 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9312
      
      Reviewed By: ajkr
      
      Differential Revision: D33196324
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 471da75eaedc54d3151672adc28643bc1d6fdf23
      7bfad071
    • P
      Fix unity build with SUPPORT_CLOCK_CACHE (#9309) · 0d9b2568
      Peter Dillinger 提交于
      Summary:
      After https://github.com/facebook/rocksdb/issues/9126
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9309
      
      Test Plan: CI
      
      Reviewed By: ajkr
      
      Differential Revision: D33188902
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 54bf34e33c2b30b1b8dc2a0229e84c194321b606
      0d9b2568
    • Y
      Update TARGETS and related scripts (#9310) · 6b5e28a4
      Yanqin Jin 提交于
      Summary:
      As title. Remove 'unexported_deps_by_default', replace 'deps' and
      'external_deps' with 'exported_deps' and 'exported_external_deps'
      respectively.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9310
      
      Test Plan: Github action and internal jobs.
      
      Reviewed By: DrMarcII
      
      Differential Revision: D33190092
      
      Pulled By: riversand963
      
      fbshipit-source-id: 64200e5331d822f88f8d122a55b7a29bfd1f9553
      6b5e28a4
  13. 17 12月, 2021 3 次提交
    • M
      Make MemoryAllocator into a Customizable class (#8980) · 423538a8
      mrambacher 提交于
      Summary:
      - Make MemoryAllocator and its implementations into a Customizable class.
      - Added a "DefaultMemoryAllocator" which uses new and delete
      - Added a "CountedMemoryAllocator" that counts the number of allocs and free
      - Updated the existing tests to use these new allocators
      - Changed the memkind allocator test into a generic test that can test the various allocators.
      - Added tests for creating all of the allocators
      - Added tests to verify/create the JemallocNodumpAllocator using its options.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8980
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D32990403
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6fdfe8218c10dd8dfef34344a08201be1fa95c76
      423538a8
    • J
      fix java doc issues (#9253) · 9828b6d5
      Jermy Li 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9253
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32990516
      
      Pulled By: mrambacher
      
      fbshipit-source-id: c7cdb6562ac6871bca6ea0d9efa454f3a902a137
      9828b6d5
    • P
      New stable, fixed-length cache keys (#9126) · 0050a73a
      Peter Dillinger 提交于
      Summary:
      This change standardizes on a new 16-byte cache key format for
      block cache (incl compressed and secondary) and persistent cache (but
      not table cache and row cache).
      
      The goal is a really fast cache key with practically ideal stability and
      uniqueness properties without external dependencies (e.g. from FileSystem).
      A fixed key size of 16 bytes should enable future optimizations to the
      concurrent hash table for block cache, which is a heavy CPU user /
      bottleneck, but there appears to be measurable performance improvement
      even with no changes to LRUCache.
      
      This change replaces a lot of disjointed and ugly code handling cache
      keys with calls to a simple, clean new internal API (cache_key.h).
      (Preserving the old cache key logic under an option would be very ugly
      and likely negate the performance gain of the new approach. Complete
      replacement carries some inherent risk, but I think that's acceptable
      with sufficient analysis and testing.)
      
      The scheme for encoding new cache keys is complicated but explained
      in cache_key.cc.
      
      Also: EndianSwapValue is moved to math.h to be next to other bit
      operations. (Explains some new include "math.h".) ReverseBits operation
      added and unit tests added to hash_test for both.
      
      Fixes https://github.com/facebook/rocksdb/issues/7405 (presuming a root cause)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9126
      
      Test Plan:
      ### Basic correctness
      Several tests needed updates to work with the new functionality, mostly
      because we are no longer relying on filesystem for stable cache keys
      so table builders & readers need more context info to agree on cache
      keys. This functionality is so core, a huge number of existing tests
      exercise the cache key functionality.
      
      ### Performance
      Create db with
      `TEST_TMPDIR=/dev/shm ./db_bench -bloom_bits=10 -benchmarks=fillrandom -num=3000000 -partition_index_and_filters`
      And test performance with
      `TEST_TMPDIR=/dev/shm ./db_bench -readonly -use_existing_db -bloom_bits=10 -benchmarks=readrandom -num=3000000 -duration=30 -cache_index_and_filter_blocks -cache_size=250000 -threads=4`
      using DEBUG_LEVEL=0 and simultaneous before & after runs.
      Before ops/sec, avg over 100 runs: 121924
      After ops/sec, avg over 100 runs: 125385 (+2.8%)
      
      ### Collision probability
      I have built a tool, ./cache_bench -stress_cache_key to broadly simulate host-wide cache activity
      over many months, by making some pessimistic simplifying assumptions:
      * Every generated file has a cache entry for every byte offset in the file (contiguous range of cache keys)
      * All of every file is cached for its entire lifetime
      
      We use a simple table with skewed address assignment and replacement on address collision
      to simulate files coming & going, with quite a variance (super-Poisson) in ages. Some output
      with `./cache_bench -stress_cache_key -sck_keep_bits=40`:
      
      ```
      Total cache or DBs size: 32TiB  Writing 925.926 MiB/s or 76.2939TiB/day
      Multiply by 9.22337e+18 to correct for simulation losses (but still assume whole file cached)
      ```
      
      These come from default settings of 2.5M files per day of 32 MB each, and
      `-sck_keep_bits=40` means that to represent a single file, we are only keeping 40 bits of
      the 128-bit cache key.  With file size of 2\*\*25 contiguous keys (pessimistic), our simulation
      is about 2\*\*(128-40-25) or about 9 billion billion times more prone to collision than reality.
      
      More default assumptions, relatively pessimistic:
      * 100 DBs in same process (doesn't matter much)
      * Re-open DB in same process (new session ID related to old session ID) on average
      every 100 files generated
      * Restart process (all new session IDs unrelated to old) 24 times per day
      
      After enough data, we get a result at the end:
      
      ```
      (keep 40 bits)  17 collisions after 2 x 90 days, est 10.5882 days between (9.76592e+19 corrected)
      ```
      
      If we believe the (pessimistic) simulation and the mathematical generalization, we would need to run a billion machines all for 97 billion days to expect a cache key collision. To help verify that our generalization ("corrected") is robust, we can make our simulation more precise with `-sck_keep_bits=41` and `42`, which takes more running time to get enough data:
      
      ```
      (keep 41 bits)  16 collisions after 4 x 90 days, est 22.5 days between (1.03763e+20 corrected)
      (keep 42 bits)  19 collisions after 10 x 90 days, est 47.3684 days between (1.09224e+20 corrected)
      ```
      
      The generalized prediction still holds. With the `-sck_randomize` option, we can see that we are beating "random" cache keys (except offsets still non-randomized) by a modest amount (roughly 20x less collision prone than random), which should make us reasonably comfortable even in "degenerate" cases:
      
      ```
      197 collisions after 1 x 90 days, est 0.456853 days between (4.21372e+18 corrected)
      ```
      
      I've run other tests to validate other conditions behave as expected, never behaving "worse than random" unless we start chopping off structured data.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D33171746
      
      Pulled By: pdillinger
      
      fbshipit-source-id: f16a57e369ed37be5e7e33525ace848d0537c88f
      0050a73a