1. 01 6月, 2022 1 次提交
    • Y
      Deflake unit test BackupEngineTest.Concurrency (#10069) · 5ab5537d
      Yanqin Jin 提交于
      Summary:
      After https://github.com/facebook/rocksdb/issues/9984, BackupEngineTest.Concurrency becomes flaky.
      
      During DB::Open(), someone else can rename/remove the LOG file, causing
      this thread's `CreateLoggerFromOptions()` to fail. The reason is that the operation sequence
      of "FileExists -> Rename" is not atomic. It's possible that a FileExists() returns OK, but the file
      gets deleted before Rename(), causing the latter to return IOError with PathNotFound subcode.
      
      Although it's not encouraged to concurrently modify the contents of the directories managed by
      the database instance in this case, we can still perform some simple handling to make DB::Open()
      more robust. In this case, we can check if a racing thread has deleted the original LOG file, we can
      allow this thread to continue creating a new LOG file.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10069
      
      Test Plan: ~/gtest-parallel/gtest-parallel -r 100 ./backup_engine_test --gtest_filter=BackupEngineTest.Concurrency
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36736913
      
      Pulled By: riversand963
      
      fbshipit-source-id: 3cbe92d77ca175e55e586bdb1a32ac8107217ae6
      5ab5537d
  2. 31 5月, 2022 4 次提交
  3. 28 5月, 2022 1 次提交
    • J
      Fix compile error in Clang 13 (#10033) · 4eb7b35f
      Jaepil Jeong 提交于
      Summary:
      This PR fixes the following compilation error in Clang 13, which was tested on macOS 12.4.
      
      ```
      ❯ ninja clean && ninja
      [1/1] Cleaning all built files...
      Cleaning... 0 files.
      [198/315] Building CXX object CMakeFiles/rocksdb.dir/util/cleanable.cc.o
      FAILED: CMakeFiles/rocksdb.dir/util/cleanable.cc.o
      ccache /opt/homebrew/opt/llvm/bin/clang++ -DGFLAGS=1 -DGFLAGS_IS_A_DLL=0 -DHAVE_FULLFSYNC -DJEMALLOC_NO_DEMANGLE -DLZ4 -DOS_MACOSX -DROCKSDB_JEMALLOC -DROCKSDB_LIB_IO_POSIX -DROCKSDB_NO_DYNAMIC_EXTENSION -DROCKSDB_PLATFORM_POSIX -DSNAPPY -DTBB -DZLIB -DZSTD -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/include -I/Users/jaepil/app/include -I/opt/homebrew/include -I/opt/homebrew/opt/llvm/include -W -Wextra -Wall -pthread -Wsign-compare -Wshadow -Wno-unused-parameter -Wno-unused-variable -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-strict-aliasing -Wno-invalid-offsetof -fno-omit-frame-pointer -momit-leaf-frame-pointer -march=armv8-a+crc+crypto -Wno-unused-function -Werror -O3 -DNDEBUG -DROCKSDB_USE_RTTI -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk -std=gnu++17 -MD -MT CMakeFiles/rocksdb.dir/util/cleanable.cc.o -MF CMakeFiles/rocksdb.dir/util/cleanable.cc.o.d -o CMakeFiles/rocksdb.dir/util/cleanable.cc.o -c /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc
      /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:24:65: error: no member named 'move' in namespace 'std'
      Cleanable::Cleanable(Cleanable&& other) noexcept { *this = std::move(other); }
                                                                 ~~~~~^
      /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:126:16: error: no member named 'move' in namespace 'std'
        *this = std::move(from);
                ~~~~~^
      2 errors generated.
      [209/315] Building CXX object CMakeFiles/rocksdb.dir/tools/block_cache_analyzer/block_cache_trace_analyzer.cc.o
      ninja: build stopped: subcommand failed.
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10033
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36580562
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0f6b241d186ed528ad62d259af2857d2c2b4ded1
      4eb7b35f
  4. 27 5月, 2022 7 次提交
    • Y
      Fail DB::Open() if logger cannot be created (#9984) · 514f0b09
      Yanqin Jin 提交于
      Summary:
      For regular db instance and secondary instance, we return error and refuse to open DB if Logger creation fails.
      
      Our current code allows it, but it is really difficult to debug because
      there will be no LOG files. The same for OPTIONS file, which will be explored in another PR.
      
      Furthermore, Arena::AllocateAligned(size_t bytes, size_t huge_page_size, Logger* logger) has an
      assertion as the following:
      
      ```cpp
      #ifdef MAP_HUGETLB
      if (huge_page_size > 0 && bytes > 0) {
        assert(logger != nullptr);
      }
      #endif
      ```
      
      It can be removed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9984
      
      Test Plan: make check
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36347754
      
      Pulled By: riversand963
      
      fbshipit-source-id: 529798c0511d2eaa2f0fd40cf7e61c4cbc6bc57e
      514f0b09
    • G
      Pass the size of blob files to SstFileManager during DB open (#10062) · e2285157
      Gang Liao 提交于
      Summary:
      RocksDB uses the (no longer aptly named) SST file manager (see https://github.com/facebook/rocksdb/wiki/Managing-Disk-Space-Utilization) to track and potentially limit the space used by SST and blob files (as well as to rate-limit the deletion of these data files). The SST file manager tracks the SST and blob file sizes in an in-memory hash map, which has to be rebuilt during DB open. File sizes can be generally obtained by querying the file system; however, there is a performance optimization possibility here since the sizes of SST and blob files are also tracked in the RocksDB MANIFEST, so we can simply pass the file sizes stored there instead of consulting the file system for each file. Currently, this optimization is only implemented for SST files; we would like to extend it to blob files as well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10062
      
      Test Plan:
      Add unit tests for the change to the test suite
      ltamasi riversand963  akankshamahajan15
      
      Reviewed By: ltamasi
      
      Differential Revision: D36726621
      
      Pulled By: gangliao
      
      fbshipit-source-id: 4010dc46ef7306142f1c2e0d1c3bf75b196ef82a
      e2285157
    • Y
      Add timestamp support to secondary instance (#10061) · 8c4ea7b8
      Yu Zhang 提交于
      Summary:
      This PR adds timestamp support to the secondary DB instance.
      
      With this, these timestamp related APIs are supported:
      
      ReadOptions.timestamp : read should return the latest data visible to this specified timestamp
      Iterator::timestamp() : returns the timestamp associated with the key, value
      DB:Get(..., std::string* timestamp) : returns the timestamp associated with the key, value in timestamp
      
      Test plan (on devserver):
      ```
      $COMPILE_WITH_ASAN=1 make -j24 all
      $./db_secondary_test --gtest_filter=DBSecondaryTestWithTimestamp*
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10061
      
      Reviewed By: riversand963
      
      Differential Revision: D36722915
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 644ada39e4e51164a759593478c38285e0c1a666
      8c4ea7b8
    • A
      Disable file ingestion in crash test for CF consistency (#10067) · f6e45382
      Andrew Kryczka 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10067
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36727948
      
      Pulled By: ajkr
      
      fbshipit-source-id: a3502730412c01ba63d822a5d4bf56f8bae8fcb2
      f6e45382
    • T
      Remove code that only compiles for Visual Studio versions older than 2015 (#10065) · 6c500826
      tagliavini 提交于
      Summary:
      There are currently some preprocessor checks that assume support for Visual Studio versions older than 2015 (i.e., 0 < _MSC_VER < 1900), although we don't support them any more.
      
      We removed all code that only compiles on those older versions, except third-party/ files.
      
      The ROCKSDB_NOEXCEPT symbol is now obsolete, since it now always gets replaced by noexcept. We removed it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10065
      
      Reviewed By: pdillinger
      
      Differential Revision: D36721901
      
      Pulled By: guidotag
      
      fbshipit-source-id: a2892d365ef53cce44a0a7d90dd6b72ee9b5e5f2
      6c500826
    • A
      Enable IngestExternalFile() in crash test (#9357) · 91ba7837
      Andrew Kryczka 提交于
      Summary:
      Thanks to https://github.com/facebook/rocksdb/issues/9919 and https://github.com/facebook/rocksdb/issues/10051 the known bugs in file ingestion (besides mmap read + file checksum) are fixed. Now we can  try again to enable file ingestion in crash test.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9357
      
      Test Plan: stress file ingestion heavily for an hour: `$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=1000000 --ingest_external_file_one_in=100 --duration=3600 --interval=20 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152`
      
      Reviewed By: riversand963
      
      Differential Revision: D33410746
      
      Pulled By: ajkr
      
      fbshipit-source-id: d276431390995a67f68390d61c06a40945fdd280
      91ba7837
    • M
      Add C API for User Defined Timestamp (#9914) · c9c58a32
      Muthu Krishnan 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/9889
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9914
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D36599983
      
      Pulled By: riversand963
      
      fbshipit-source-id: 39000fb473f850d88359e90b287035257854af0d
      c9c58a32
  5. 26 5月, 2022 7 次提交
    • J
      Expose DisableManualCompaction and EnableManualCompaction to C api (#10052) · 4cf2f672
      Jie Liang Ang 提交于
      Summary:
      Add `rocksdb_disable_manual_compaction` and `rocksdb_enable_manual_compaction`.
      
      Note that `rocksdb_enable_manual_compaction` should be used with care and must not be called more times than `rocksdb_disable_manual_compaction` has been called.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10052
      
      Reviewed By: ajkr
      
      Differential Revision: D36665496
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: a4ae6e34694066feb21302ca1a5c365fb9de0ec7
      4cf2f672
    • A
      Provide support for IOTracing for ReadAsync API (#9833) · 28ea1fb4
      Akanksha Mahajan 提交于
      Summary:
      Same as title
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9833
      
      Test Plan:
      Add unit test and manually check the output of tracing logs
      For fixed readahead_size it logs as:
      ```
      Access Time : 193352113447923     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 15075     , IO Status: OK, Length: 12288, Offset: 659456
      Access Time : 193352113465232     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 14425     , IO Status: OK, Length: 12288, Offset: 671744
      Access Time : 193352113481539     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13062     , IO Status: OK, Length: 12288, Offset: 684032
      Access Time : 193352113497692     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13649     , IO Status: OK, Length: 12288, Offset: 696320
      Access Time : 193352113520043     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 19384     , IO Status: OK, Length: 12288, Offset: 708608
      Access Time : 193352113538401     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 15406     , IO Status: OK, Length: 12288, Offset: 720896
      Access Time : 193352113554855     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13670     , IO Status: OK, Length: 12288, Offset: 733184
      Access Time : 193352113571624     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13855     , IO Status: OK, Length: 12288, Offset: 745472
      Access Time : 193352113587924     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13953     , IO Status: OK, Length: 12288, Offset: 757760
      Access Time : 193352113603285     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 59        , IO Status: Not implemented: Prefetch not supported, Length: 8868, Offset: 898349
      ```
      
      For implicit readahead:
      ```
      Access Time : 193351865156587     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 48        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 391174
      Access Time : 193351865160354     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 51        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 395248
      Access Time : 193351865164253     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 49        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 399322
      Access Time : 193351865165461     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 222871    , IO Status: OK, Length: 135168, Offset: 401408
      ```
      
      Reviewed By: anand1976
      
      Differential Revision: D35601634
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 5a4f32a850af878efa0767bd5706380152a1f26e
      28ea1fb4
    • J
      Fix flaky db_basic_bench caused by unreleased iterator (#10058) · 5490da20
      Jay Zhuang 提交于
      Summary:
      Iterator is not freed after test is done (after the main for
      loop), which could cause db close failed.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10058
      
      Test Plan:
      Able to reproduce consistently with higher thread number,
      like 100, make sure it passes after the fix
      
      Reviewed By: ajkr
      
      Differential Revision: D36685823
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 4c98b8758d106bfe40cae670e689c3d284765bcf
      5490da20
    • P
      Abort RocksDB performance regression test on failure in test setup (#10053) · bd170dda
      Peter Dillinger 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10053
      
      Need to exit if ldb command fails, to avoid running db_bench on
      empty/bad DB and considering the results valid.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36673200
      
      fbshipit-source-id: e0d78a0d397e0e335d82d9349bfd612d38ffb552
      bd170dda
    • S
      FindObsoleteFiles() to directly check whether candidate files are live (#10040) · 356f8c5d
      sdong 提交于
      Summary:
      Right now, in FindObsoleteFiles() we build a list of all live SST files from all existing Versions. This is all done in DB mutex, and is O(m*n) where m is number of versions and n is number of files. In some extereme cases, it can take very long. The list is used to see whether a candidate file still shows up in a version. With this commit, every candidate file is directly check against all the versions for file existance. This operation would be O(m*k) where k is number of candidate files. Since is usually small (except perhaps full compaction in universal compaction), this is usually much faster than the previous solution.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10040
      
      Test Plan: TBD
      
      Reviewed By: riversand963
      
      Differential Revision: D36613391
      
      fbshipit-source-id: 3f13b090f755d9b3ae417faec62cd6e798bac1eb
      356f8c5d
    • C
      Update VersionSet last seqno after LogAndApply (#10051) · b0e19060
      Changyu Bi 提交于
      Summary:
      This PR fixes the issue of unstable snapshot during external SST file ingestion. Credit ajkr for the following walk through:  consider these relevant steps for of IngestExternalFile():
      
      (1) increase seqno while holding mutex -- https://github.com/facebook/rocksdb/blob/677d2b4a8f8fd19d0c39a9ee8f648742e610688d/db/db_impl/db_impl.cc#L4768
      (2) LogAndApply() -- https://github.com/facebook/rocksdb/blob/677d2b4a8f8fd19d0c39a9ee8f648742e610688d/db/db_impl/db_impl.cc#L4797-L4798
        (a) write to MANIFEST with mutex released https://github.com/facebook/rocksdb/blob/a96a4a2f7ba7633ab2cc51defd1e923e20d239a6/db/version_set.cc#L4407
        (b) apply to in-memory state with mutex held
      
      A snapshot taken during (2a) will be unstable. In particular, queries against that snapshot will not include data from the ingested file before (2b), and will include data from the ingested file after (2b).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10051
      
      Test Plan:
      Added a new unit test: `ExternalSSTFileBasicTest.WriteAfterReopenStableSnapshotWhileLoggingToManifest`.
      ```
      make external_sst_file_basic_test
      ./external_sst_file_basic_test
      ```
      
      Reviewed By: ajkr
      
      Differential Revision: D36654033
      
      Pulled By: cbi42
      
      fbshipit-source-id: bf720cca313e0cf211585960f3aff04853a31b96
      b0e19060
    • Y
      Improve transaction C-API (#9252) · b71466e9
      Yiyuan Liu 提交于
      Summary:
      This PR wants to improve support for transaction in C-API:
      * Support two-phase commit.
      * Support `get_pinned` and `multi_get` in transaction.
      * Add `rocksdb_transactiondb_flush`
      * Support get writebatch from transaction and rebuild transaction from writebatch.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9252
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36459007
      
      Pulled By: riversand963
      
      fbshipit-source-id: 47371d527be821c496353a7fe2fd18d628069a98
      b71466e9
  6. 25 5月, 2022 8 次提交
    • Y
      Enable checkpoint and backup in db_stress when timestamp is enabled (#10047) · 9901e7f6
      Yanqin Jin 提交于
      Summary:
      After https://github.com/facebook/rocksdb/issues/10030 and https://github.com/facebook/rocksdb/issues/10004, we can enable checkpoint and backup in stress tests when
      user-defined timestamp is enabled.
      
      This PR has no production risk.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10047
      
      Test Plan:
      ```
      TEST_TMPDIR=/dev/shm make crash_test_with_ts
      ```
      
      Reviewed By: jowlyzhang
      
      Differential Revision: D36641565
      
      Pulled By: riversand963
      
      fbshipit-source-id: d86c9d87efcc34c32d1aa176af691d32b897644a
      9901e7f6
    • L
      Fix potential ambiguities in/around port/sys_time.h (#10045) · af7ae912
      Levi Tamasi 提交于
      Summary:
      There are some time-related POSIX APIs that are not available on Windows
      (e.g. `localtime_r`), which we have worked around by providing our own
      implementations in `port/sys_time.h`. This workaround actually relies on
      some ambiguity: on Windows, a call to `localtime_r` calls
      `ROCKSDB_NAMESPACE::port::localtime_r` (which is pulled into
      `ROCKSDB_NAMESPACE` by a using-declaration), while on other platforms
      it calls the global `localtime_r`. This works fine as long as there is only one
      candidate function; however, it breaks down when there is more than one
      `localtime_r` visible in a scope.
      
      The patch fixes this by introducing `ROCKSDB_NAMESPACE::port::{TimeVal, GetTimeOfDay, LocalTimeR}`
      to eliminate any ambiguity.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10045
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D36639372
      
      Pulled By: ltamasi
      
      fbshipit-source-id: fc13dbfa421b7c8918111a6d9e24ce77e91a7c50
      af7ae912
    • J
      Fix ApproximateOffsetOfCompressed test (#10048) · a96a4a2f
      Jay Zhuang 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/9857 introduced new an option `use_zstd_dict_trainer`, which
      is stored in SST as text, e.g.:
      ```
      ...  zstd_max_train_bytes=0; enabled=0;...
      ```
      it increased the sst size a little bit and cause
      `ApproximateOffsetOfCompressed` test to fail:
      ```
      Value 7053 is not in range [4000, 7050]
      table/table_test.cc:4019: Failure
      Value of: Between(c.ApproximateOffsetOf("xyz"), 4000, 7050)
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10048
      
      Test Plan: verified the test pass after the change
      
      Reviewed By: cbi42
      
      Differential Revision: D36643688
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: bf12d211f6ae71937259ef21b1226bd06e8da717
      a96a4a2f
    • J
      Skip ZSTD dict tests if the version doesn't support it (#10046) · 23f34c7a
      Jay Zhuang 提交于
      Summary:
      For example, the default ZSTD version for ubuntu20 is 1.4.4, which will
      fail the test `PresetCompressionDict`:
      
      ```
      db/db_test_util.cc:607: Failure
      Invalid argument: zstd finalizeDictionary cannot be used because ZSTD 1.4.5+ is not linked with the binary.
      terminate called after throwing an instance of 'testing::internal::GoogleTestFailureException'
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10046
      
      Test Plan: test pass with old zstd
      
      Reviewed By: cbi42
      
      Differential Revision: D36640067
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b1c49fb7295f57f4515ce4eb3a52ae7d7e45da86
      23f34c7a
    • S
      Avoid malloc_usable_size() call inside LRU Cache mutex (#10026) · c78a87cd
      sdong 提交于
      Summary:
      In LRU Cache mutex, we sometimes call malloc_usable_size() to calculate memory used by the metadata object. We prevent it by saving the charge + metadata size, rather than charge, inside the metadata itself. Within the mutex, usually only total charge is needed so we don't need to repeat.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10026
      
      Test Plan: Run existing tests.
      
      Reviewed By: pdillinger
      
      Differential Revision: D36556253
      
      fbshipit-source-id: f60c96d13cde3af77732e5548e4eac4182fa9801
      c78a87cd
    • Y
      Add timestamp support to CompactedDBImpl (#10030) · d4081bf0
      Yu Zhang 提交于
      Summary:
      This PR is the second and last part for adding user defined timestamp support to read only DB. Specifically, the change in this PR includes:
      
      - `options.timestamp` respected by `CompactedDBImpl::Get` and `CompactedDBImpl::MultiGet` to return results visible up till that timestamp.
      - `CompactedDBImpl::Get(...,std::string* timestsamp)` and `CompactedDBImpl::MultiGet(std::vector<std::string>* timestamps)` return the timestamp(s) associated with the key(s).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10030
      
      Test Plan:
      ```
      $COMPILE_WITH_ASAN=1 make -j24 all
      $./db_readonly_with_timestamp_test --gtest_filter="DBReadOnlyTestWithTimestamp.CompactedDB*"
      $./db_basic_test --gtest_filter="DBBasicTest.CompactedDB*"
      $make all check
      ```
      
      Reviewed By: riversand963
      
      Differential Revision: D36613926
      
      Pulled By: jowlyzhang
      
      fbshipit-source-id: 5b7ed7fef822708c12e2caf7a8d2deb6a696f0f0
      d4081bf0
    • C
      Support read rate-limiting in SequentialFileReader (#9973) · 8515bd50
      Changyu Bi 提交于
      Summary:
      Added rate limiter and read rate-limiting support to SequentialFileReader. I've updated call sites to SequentialFileReader::Read with appropriate IO priority (or left a TODO and specified IO_TOTAL for now).
      
      The PR is separated into four commits: the first one added the rate-limiting support, but with some fixes in the unit test since the number of request bytes from rate limiter in SequentialFileReader are not accurate (there is overcharge at EOF). The second commit fixed this by allowing SequentialFileReader to check file size and determine how many bytes are left in the file to read. The third commit added benchmark related code. The fourth commit moved the logic of using file size to avoid overcharging the rate limiter into backup engine (the main user of SequentialFileReader).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9973
      
      Test Plan:
      - `make check`, backup_engine_test covers usage of SequentialFileReader with rate limiter.
      - Run db_bench to check if rate limiting is throttling as expected: Verified that reads and writes are together throttled at 2MB/s, and at 0.2MB chunks that are 100ms apart.
        - Set up: `./db_bench --benchmarks=fillrandom -db=/dev/shm/test_rocksdb`
        - Benchmark:
      ```
      strace -ttfe read,write ./db_bench --benchmarks=backup -db=/dev/shm/test_rocksdb --backup_rate_limit=2097152 --use_existing_db
      strace -ttfe read,write ./db_bench --benchmarks=restore -db=/dev/shm/test_rocksdb --restore_rate_limit=2097152 --use_existing_db
      ```
      - db bench on backup and restore to ensure no performance regression.
        - backup (avg over 50 runs): pre-change: 1.90443e+06 micros/op; post-change: 1.8993e+06 micros/op (improve by 0.2%)
        - restore (avg over 50 runs): pre-change: 1.79105e+06 micros/op; post-change: 1.78192e+06 micros/op (improve by 0.5%)
      
      ```
      # Set up
      ./db_bench --benchmarks=fillrandom -db=/tmp/test_rocksdb -num=10000000
      
      # benchmark
      TEST_TMPDIR=/tmp/test_rocksdb
      NUM_RUN=50
      for ((j=0;j<$NUM_RUN;j++))
      do
         ./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=backup -use_existing_db | egrep 'backup'
        # Restore
        #./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=restore -use_existing_db
      done > rate_limit.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' rate_limit.txt >> rate_limit_2.txt
      ```
      
      Reviewed By: hx235
      
      Differential Revision: D36327418
      
      Pulled By: cbi42
      
      fbshipit-source-id: e75d4307cff815945482df5ba630c1e88d064691
      8515bd50
    • J
      Fix failed VerifySstUniqueIds unittests (#10043) · fd24e447
      Jay Zhuang 提交于
      Summary:
      which should use UniqueId64x2 instead of string.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10043
      
      Test Plan: unittest
      
      Reviewed By: pdillinger
      
      Differential Revision: D36620366
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: cf937a1da362018472fa4396848225e48893848b
      fd24e447
  7. 24 5月, 2022 7 次提交
    • A
      Expose unix time in rocksdb::Snapshot (#9923) · 700d597b
      Akanksha Mahajan 提交于
      Summary:
      RocksDB snapshot already has a member unix_time_ set after
      snapshot is taken. It is now exposed through GetSnapshotTime() API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9923
      
      Test Plan: Update unit tests
      
      Reviewed By: riversand963
      
      Differential Revision: D36048275
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 825210ec287deb0bc3aaa9b8e1f079f07ad686fa
      700d597b
    • A
      Fix fbcode internal build failure (#10041) · 8e9d9156
      anand76 提交于
      Summary:
      The build failed due to different namespaces for coroutines (std::experimental vs std) based on compiler version.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10041
      
      Reviewed By: ltamasi
      
      Differential Revision: D36617212
      
      Pulled By: anand1976
      
      fbshipit-source-id: dfb25320788d32969317d5651173059e2cbd8bd5
      8e9d9156
    • L
      Update version on main to 7.4 and add 7.3 to the format compatibility checks (#10038) · 253ae017
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10038
      
      Reviewed By: riversand963
      
      Differential Revision: D36604533
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 54ccd0a4b32a320b5640a658ea6846ee897065d1
      253ae017
    • A
      Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged"... · a479c2c2
      Akanksha Mahajan 提交于
      Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged" with async_io enabled (#10032)
      
      Summary:
      In case of non sequential reads with `async_io`, `FilePRefetchBuffer::TryReadFromCacheAsync` can be called for previous blocks with `offset < bufs_[curr_].offset_` which wasn't handled correctly resulting wrong data being returned from buffer.
      
      Since `FilePRefetchBuffer::PrefetchAsync` can be called for any data block, it sets `prev_len_` to 0  indicating `FilePRefetchBuffer::TryReadFromCacheAsync` to go for the prefetching even though offset < bufs_[curr_].offset_  This is because async prefetching is always done in second buffer (to avoid mutex) even though curr_ is empty leading to  offset < bufs_[curr_].offset_ in some cases.
      If prev_len_ is non zero then `TryReadFromCacheAsync` returns false if `offset < bufs_[curr_].offset_ && prev_len != 0` indicating reads are not sequential and previous call wasn't PrefetchAsync.
      
      -  This PR also simplifies `FilePRefetchBuffer::TryReadFromCacheAsync` as it was getting complicated covering different scenarios based on `async_io` enabled/disabled. If `for_compaction` is set true, it now calls `FilePRefetchBufferTryReadFromCache` following synchronous flow as before. Its decided in BlockFetcher.cc
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10032
      
      Test Plan:
      1.  export CRASH_TEST_EXT_ARGS=" --async_io=1"
           make crash_test -j completed successfully locally
      2. make crash_test -j completed successfully locally
      3. Reran CircleCi mini crashtest job 4 - 5 times.
      4. Updated prefetch_test for more coverage.
      
      Reviewed By: anand1976
      
      Differential Revision: D36579858
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 0c428d62b45e12e082a83acf533a5e37a584bedf
      a479c2c2
    • S
      Move three info logging within DB Mutex to use log buffer (#10029) · bea5831b
      sdong 提交于
      Summary:
      info logging with DB Mutex could potentially invoke I/O and cause performance issues. Move three of the cases to use log buffer.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10029
      
      Test Plan: Run existing tests.
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36561694
      
      fbshipit-source-id: cabb93fea299001a6b4c2802fcba3fde27fa062c
      bea5831b
    • P
      Java build: finish compiling before testing (etc) (#10034) · 1e4850f6
      Peter Dillinger 提交于
      Summary:
      Lack of ordering dependencies could lead to random
      build-linux-java failures with "Truncated class file" because tests
      started before compilation was finished. (Fix to java/Makefile)
      
      Also:
      * export SHA256_CMD to save copy-paste
      * Actually fail if Java sample build fails--which it was in CircleCI
      * Don't require Snappy for Java sample build (for more compatibility)
      * Remove check_all_python from jtest because it's running in `make
      check` builds in CircleCI
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10034
      
      Test Plan: CI, some manual
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36596541
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 230d79db4b7ae93a366871ff09d0a88e8e1c8af3
      1e4850f6
    • T
      Add plugin header install in CMakeLists.txt (#10025) · cb858600
      Tom Blamer 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/9987.
      - Plugin specific headers can be specified by setting ${PLUGIN_NAME}_HEADERS in ${PLUGIN_NAME}.mk in the plugin directory.
      - This is supported by the Makefile based build, but was missing from CMakeLists.txt.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10025
      
      Test Plan:
      - Add a plugin with ${PLUGIN_NAME}_HEADERS set in both ${PLUGIN_NAME}.mk and CMakeLists.txt
      - Run Makefile based install and CMake based install and verify installed headers match
      
      Reviewed By: riversand963
      
      Differential Revision: D36584908
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5ea0137205ccbf0d36faacf45d712c5604065bb5
      cb858600
  8. 23 5月, 2022 1 次提交
    • A
      Minimum macOS version needed to build v7.2.2 and up is 10.13 (#9976) · 56ce3aef
      Adam Retter 提交于
      Summary:
      Some C++ code changes between version 7.1.2 and 7.2.2 now seem to require at least macOS 10.13 (2017) to build successfully, previously we needed 10.12 (2016). I haven't been able to identify the exact commit.
      
      **NOTE**: This needs to be merged to both `main` and `7.2.fb` branches.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9976
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D36303226
      
      Pulled By: ajkr
      
      fbshipit-source-id: 589ce3ecf821db3402b0876e76d37b407896c945
      56ce3aef
  9. 21 5月, 2022 4 次提交
    • L
      Update HISTORY for 7.3 release (#10031) · bed40e72
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10031
      
      Reviewed By: riversand963
      
      Differential Revision: D36567741
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 058f8cc856d276db6e1aed07a89ac0b7118c4435
      bed40e72
    • Y
      Point libprotobuf-mutator to the latest verified commit hash (#10028) · 899db56a
      Yanqin Jin 提交于
      Summary:
      Recent updates to https://github.com/google/libprotobuf-mutator has caused link errors for RocksDB
      CircleCI job 'build-fuzzers'. This PR points the CI to a specific, most recent verified commit hash.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10028
      
      Test Plan: watch for CI to finish.
      
      Reviewed By: pdillinger, jay-zhuang
      
      Differential Revision: D36562517
      
      Pulled By: riversand963
      
      fbshipit-source-id: ba5ef0f9ed6ea6a75aa5dd2768bd5f389ac14f46
      899db56a
    • Y
      Fix a bug of not setting enforce_single_del_contracts (#10027) · f648915b
      Yanqin Jin 提交于
      Summary:
      Before this PR, BuildDBOptions() does not set a newly-added option, i.e.
      enforce_single_del_contracts, causing OPTIONS files to contain incorrect
      information.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/10027
      
      Test Plan:
      make check
      Manually check OPTIONS file.
      
      Reviewed By: ltamasi
      
      Differential Revision: D36556125
      
      Pulled By: riversand963
      
      fbshipit-source-id: e1074715b22c328b68c19e9ad89aa5d67d864bb5
      f648915b
    • A
      Seek parallelization (#9994) · 2db6a4a1
      Akanksha Mahajan 提交于
      Summary:
      The RocksDB iterator is a hierarchy of iterators. MergingIterator maintains a heap of LevelIterators, one for each L0 file and for each non-zero level. The Seek() operation naturally lends itself to parallelization, as it involves positioning every LevelIterator on the correct data block in the correct SST file. It lookups a level for a target key, to find the first key that's >= the target key. This typically involves reading one data block that is likely to contain the target key, and scan forward to find the first valid key. The forward scan may read more data blocks. In order to find the right data block, the iterator may read some metadata blocks (required for opening a file and searching the index).
      This flow can be parallelized.
      
      Design: Seek will be called two times under async_io option. First seek will send asynchronous request to prefetch the data blocks at each level and second seek will follow the normal flow and in FilePrefetchBuffer::TryReadFromCacheAsync it will wait for the Poll() to get the results and add the iterator to min_heap.
      - Status::TryAgain is passed down from FilePrefetchBuffer::PrefetchAsync to block_iter_.Status indicating asynchronous request has been submitted.
      - If for some reason asynchronous request returns error in submitting the request, it will fallback to sequential reading of blocks in one pass.
      - If the data already exists in prefetch_buffer, it will return the data without prefetching further and it will be treated as single pass of seek.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9994
      
      Test Plan:
      - **Run Regressions.**
      ```
      ./db_bench -db=/tmp/prefix_scan_prefetch_main -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -use_direct_io_for_flush_and_compaction=true -target_file_size_base=16777216
      ```
      i) Previous release 7.0 run for normal prefetching with async_io disabled:
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.0
      Date:       Thu Mar 17 13:11:34 2022
      CPU:        24 * Intel Core Processor (Broadwell)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
      ```
      
      ii) normal prefetching after changes with async_io disable:
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
      Set seed to 1652922591315307 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.3
      Date:       Wed May 18 18:09:51 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  483080.466 micros/op 2 ops/sec 120.287 seconds 249 operations;  340.8 MB/s (249 of 249 found)
      ```
      iii) db_bench with async_io enabled completed succesfully
      
      ```
      ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1 -adaptive_readahead=1
      Set seed to 1652924062021732 because --seed was 0
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 7.3
      Date:       Wed May 18 18:34:22 2022
      CPU:        32 * Intel Xeon Processor (Skylake)
      CPUCache:   16384 KB
      Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
      Values:     512 bytes each (256 bytes after compression)
      Entries:    5000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    2594.0 MB (estimated)
      FileSize:   1373.3 MB (estimated)
      Write rate: 0 bytes/second
      Read rate: 0 ops/second
      Compression: Snappy
      Compression sampling rate: 0
      Memtablerep: SkipListFactory
      Perf Level: 1
      ------------------------------------------------
      DB path: [/tmp/prefix_scan_prefetch_main]
      seekrandom   :  553913.576 micros/op 1 ops/sec 120.199 seconds 217 operations;  293.6 MB/s (217 of 217 found)
      ```
      
      - db_stress with async_io disabled completed succesfully
      ```
       export CRASH_TEST_EXT_ARGS=" --async_io=0"
       make crash_test -j
      ```
      
      I**n Progress**: db_stress with async_io is failing and working on debugging/fixing it.
      
      Reviewed By: anand1976
      
      Differential Revision: D36459323
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: abb1cd944abe712bae3986ae5b16704b3338917c
      2db6a4a1