1. 25 11月, 2021 2 次提交
    • P
      Fix bug affecting GetSortedWalFiles, Backups, Checkpoint (#9208) · 2a67d475
      Peter Dillinger 提交于
      Summary:
      Saw error like this:
      `Backup failed -- IO error: No such file or directory: While opening a
      file for sequentially reading:
      /dev/shm/rocksdb/rocksdb_crashtest_blackbox/004426.log: No such file or
      directory`
      
      Unfortunately, GetSortedWalFiles (used by Backups, Checkpoint, etc.)
      relies on no file deletions happening while its operating, which
      means not only disabling (more) deletions, but ensuring any pending
      deletions are completed. Two fixes related to this:
      
      * There was a gap in several places between decrementing
      pending_purge_obsolete_files_ and incrementing bg_purge_scheduled_ where
      the db mutex would be released and GetSortedWalFiles (and others) could
      get false information that no deletions are pending.
      
      * The fix to https://github.com/facebook/rocksdb/issues/8591 (disabling deletions in GetSortedWalFiles) seems
      incomplete because it doesn't prevent pending deletions from occuring
      during the operation (if deletions not already disabled, the case that
      was to be fixed by the change).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9208
      
      Test Plan:
      existing tests (it's hard to write a test for interleavings
      that are now excluded - this is what stress test is for)
      
      Reviewed By: ajkr
      
      Differential Revision: D32630675
      
      Pulled By: pdillinger
      
      fbshipit-source-id: a121e3da648de130cd24d44c524232f4eb22f178
      2a67d475
    • J
      Fix compile warnings (#9199) · 5384f0af
      jsteemann 提交于
      Summary:
      * added missing override specifiers for overriden methods
        this fixes compiler warnings emitted by g++ and clang++ when compile option `-Wsuggest-override` is turned on.
      * fix compile warning with -Wmaybe-uninitialized
        g++-11 warns about a _potentially_ uninitialized variable when using `-Wmaybe_uninitialized`:
        ```
            env/env.cc: In member function ‘virtual rocksdb::Status rocksdb::Env::GetHostNameString(std::string*)’:
            env/env.cc:738:66: error: ‘hostname_buf’ may be used uninitialized [-Werror=maybe-uninitialized]
              738 |   Status s = GetHostName(hostname_buf.data(), hostname_buf.size());
                  |                                                                  ^
            In file included from /usr/include/c++/11/tuple:39,
                             from /usr/include/c++/11/functional:54,
                             from ./include/rocksdb/env.h:22,
                             from env/env.cc:10:
            /usr/include/c++/11/array:176:7: note: by argument 1 of type ‘const std::array<char, 256>*’ to ‘constexpr std::array<_Tp, _Nm>::size_type std::array<_Tp, _Nm>::size() const [with _Tp = char; long unsigned int _Nm = 256]’ declared here
              176 |       size() const noexcept { return _Nm; }
                  |       ^~~~
            env/env.cc:737:37: note: ‘hostname_buf’ declared here
              737 |   std::array<char, kMaxHostNameLen> hostname_buf;
        ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9199
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32630703
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 9ea3010b1105a582548e3c3c0db4475b201e4a10
      5384f0af
  2. 24 11月, 2021 1 次提交
  3. 23 11月, 2021 2 次提交
    • Y
      Fix internal build error (#9195) · dc0ee3e5
      Yanqin Jin 提交于
      Summary:
      Internal build reported:
      ```
      rocksdb/listener.h:470:3: error: extra ';' inside a struct [-Werror,-Wextra-semi]
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9195
      
      Test Plan: import to fbcode and compile.
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D32590138
      
      Pulled By: riversand963
      
      fbshipit-source-id: ca4ed9cca210a1a9a12d3de17c789ef9151c57e8
      dc0ee3e5
    • Y
      Print file checksum in hex (#9196) · 12e98add
      Yanqin Jin 提交于
      Summary:
      Printing file checksum (usually an integer) in non-hex format is barely useful. To make the matter
      worse, it can mess with the output format. If you use `less` to redirect the output of `ldb manifest_dump`,
      non-hex file checksum can cause `less` not to function as expected.
      
      Also output some additional fields to json output.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9196
      
      Test Plan: manually test `ldb manifest_dump`.
      
      Reviewed By: ajkr
      
      Differential Revision: D32590253
      
      Pulled By: riversand963
      
      fbshipit-source-id: de434b7e60dd05b0b7cb76eff2240b21f9ae4b32
      12e98add
  4. 21 11月, 2021 1 次提交
    • P
      Fix some CI output (#9193) · d561432d
      Peter Dillinger 提交于
      Summary:
      Address some issues with https://github.com/facebook/rocksdb/issues/9188
      * Internal CI doesn't render \r as anything, so use \n for "not
      connected to terminal" case
      * CircleCI apparently uses a pseudo-tty for output and although
      rerdirect stdout (because of EAGAIN bug) we don't redirect stderr, so it
      is detected as a terminal-connected case. Fix by redirecting stderr
      also.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9193
      
      Test Plan: manual, CI
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D32581128
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 5ae7c3209128d8dbd4153c5b9fdb2b810e6deb2e
      d561432d
  5. 20 11月, 2021 12 次提交
    • Z
      Disable the QPS verification in test temporally (#9190) · 4340e1ff
      Zhichao Cao 提交于
      Summary:
      Disable the QPS verification in test temporally, which causes the test failure due to different system delays.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9190
      
      Test Plan: make check
      
      Reviewed By: siying
      
      Differential Revision: D32576289
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 1df972e77dd82eed5af3462e5db5e141aadf8fae
      4340e1ff
    • Y
      Update HISTORY and version.h for 6.27 release (#9192) · 81016436
      Yanqin Jin 提交于
      Summary:
      As title.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9192
      
      Reviewed By: ltamasi
      
      Differential Revision: D32578141
      
      Pulled By: riversand963
      
      fbshipit-source-id: 16216451c87e383ca8fd309acf15106e46172aaa
      81016436
    • L
      Update HISTORY for PR 9187 (#9191) · 3a9f5574
      Levi Tamasi 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9191
      
      Reviewed By: riversand963
      
      Differential Revision: D32577939
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 3c52067a0c3e9219c1aafdb711718dfcce5dedf5
      3a9f5574
    • Y
      Fix an assertion failure when ManifestTailer switches to new Manifest in multi-cf mode (#9143) · 43ac7a27
      Yanqin Jin 提交于
      Summary:
      Original unit test fail to test the case of multi-cf mode switching to new manifest. The assertion
      failure will trigger when the primary instance reopens and secondary continues to tail the
      newly-created MANIFEST. Fix the assertion failure and update existing unit tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9143
      
      Test Plan: make check
      
      Reviewed By: ltamasi
      
      Differential Revision: D32574233
      
      Pulled By: riversand963
      
      fbshipit-source-id: 857ddbe994019091276458abebcf8e2b65340468
      43ac7a27
    • L
      Support readahead during compaction for blob files (#9187) · dc5de45a
      Levi Tamasi 提交于
      Summary:
      The patch adds a new BlobDB configuration option `blob_compaction_readahead_size`
      that can be used to enable prefetching data from blob files during compaction.
      This is important when using storage with higher latencies like HDDs or remote filesystems.
      If enabled, prefetching is used for all cases when blobs are read during compaction,
      namely garbage collection, compaction filters (when the existing value has to be read from
      a blob file), and `Merge` (when the value of the base `Put` is stored in a blob file).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9187
      
      Test Plan: Ran `make check` and the stress/crash test.
      
      Reviewed By: riversand963
      
      Differential Revision: D32565512
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 87be9cebc3aa01cc227bec6b5f64d827b8164f5d
      dc5de45a
    • P
      Fix backward compatibility with 2.5 through 2.7 (#9189) · cd4ea675
      Peter Dillinger 提交于
      Summary:
      A bug in https://github.com/facebook/rocksdb/issues/9163 can cause checksum verification to fail if
      parsing a properties block fails.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9189
      
      Test Plan:
      check_format_compatible.sh (never quite works locally but
      this particular case seems fixed using variants of SHORT_TEST=1).
      And added new unit test case.
      
      Reviewed By: ajkr
      
      Differential Revision: D32574626
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6fa5c8595737b71a3c3d011a52daf6d6c08715d7
      cd4ea675
    • J
      Deprecating `iter_start_seqnum` and `preserve_deletes` (#9091) · 6cde8d21
      Jay Zhuang 提交于
      Summary:
      `ReadOptions::iter_start_seqnum` and `DBOptions::preserve_deletes` are
      deprecated, please try using user defined timestamp feature instead.
      The feature is used to support differential snapshots, but not well
      maintained (https://github.com/facebook/rocksdb/issues/6837, https://github.com/facebook/rocksdb/issues/8472) and the interface is not user friendly which
      returns an internal key from the iterator. The user defined timestamp
      feature is a more flexible feature to support similar usecase, please
      switch to that if you have such usecase.
      The deprecated feature will be removed in a future release.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9091
      
      Test Plan:
      check LOG
      
      Fix https://github.com/facebook/rocksdb/issues/9090
      
      Reviewed By: ajkr
      
      Differential Revision: D32071750
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: b882c4668dd1bf26ce03c4c192f1bba584bf6104
      6cde8d21
    • P
      Print failures in parallel `make check` (#9188) · 3ce4d4f5
      Peter Dillinger 提交于
      Summary:
      Generating megabytes of successful test output has caused
      issues / inconveniences for CI and internal sandcastle runs. This
      changes their configuration to only print output from failed tests.
      (Successful test output is still available in files under t/.)
      
      This likewise changes default behavior of parallel `make check` as
      a quick team poll showed interest in that.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9188
      
      Test Plan:
      Seed some test failures and observe
      * `make -j24 check` (new behavior)
      * `PRINT_PARALLEL_OUTPUTS=1 make -j24 check` (old CI behavior)
      * `QUIET_PARALLEL_TESTS=1 make -j24 check` (old manual run behavior)
      
      Reviewed By: siying
      
      Differential Revision: D32567392
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 8d8fb64aebd16bca103b11e3bd1f13c488a69611
      3ce4d4f5
    • A
      Some small changes to RocksJava build (#9186) · ad40b0be
      Adam Retter 提交于
      Summary:
      1. Added a target for building a bundle jar for Sonatype Nexus - sometimes if the OSS Maven Central is misbehaving, it is quicker to upload a bundle to be processed for release.
      
      2. Simplify the publish code by using a for-loop.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9186
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32564469
      
      Pulled By: ajkr
      
      fbshipit-source-id: aaceac27e9143fb65b61dad2a46df346586672cd
      ad40b0be
    • S
      Track each SST's timestamp information as user properties (#9093) · e12753eb
      slk 提交于
      Summary:
      Track each SST's timestamp information as user properties https://github.com/facebook/rocksdb/issues/8959
      
      Rockdb has supported user-defined timestamp feature. Application can specify a timestamp
      when writing each k-v pair. When data flush from memory to disk file called SST files.
      Each SST files consist of multiple data blocks and several metadata blocks. Among the metadata
      blocks, there is one called Properties block that tracks some pre-defined properties of this SST file.
      
      This PR is for collecting the properties of min and max timestamps of all keys in the file. With those
      properties the SST file is more convenient to tell whether the keys in the SST have timestamps or not.
      
      The changes involved are as follows:
      
      1) Add a class TimestampTablePropertiesCollector to collect min/max timestamp when add keys to table,
         The way TimestampTablePropertiesCollector use to compare timestamp of key should defined by
         user by implementing the Comparator::CompareTimestamp function in the user defined comparator.
      2) Add corresponding unit tests.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9093
      
      Reviewed By: ltamasi
      
      Differential Revision: D32406927
      
      Pulled By: riversand963
      
      fbshipit-source-id: 25922971b7e67bacf4d53a1fb67c4c5ddaa61573
      e12753eb
    • S
      Fix flaky DBTest2.RateLimitedCompactionReads (#9185) · 12117b26
      sdong 提交于
      Summary:
      DBTest2.RateLimitedCompactionReads sometime shows following failure:
      
        what():  db/db_test2.cc:3976: Failure
      Expected equality of these values:
        i + 1
          Which is: 4
        NumTableFilesAtLevel(0)
          Which is: 0
      
      The assertion itself doesn't appear to be correct. Fix it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9185
      
      Test Plan: Removing an assertion shouldn't break anything.
      
      Reviewed By: ajkr
      
      Differential Revision: D32549530
      
      fbshipit-source-id: 9993372d8af89161f903337a13f3e316e690a6b8
      12117b26
    • Y
      Fix a bug in FlushJob picking more memtables beyond synced WALs (#9142) · 1e8322c0
      Yanqin Jin 提交于
      Summary:
      After RocksDB 6.19 and before this PR, RocksDB FlushJob may pick more memtables to flush beyond synced WALs.
      This can be problematic if there are multiple column families, since it can prematurely advance the flushed column
      family's log_number. Should subsequent attempts fail to sync the latest WALs and the database goes
      through a recovery, it may detect corrupted WAL number below the flushed column family's log number
      and complain about column family inconsistency.
      To fix, we record the maximum memtable ID of the column family being flushed. Then we call SyncClosedLogs()
      so that all closed WALs at the time when memtable ID is recorded will be synced.
      I also disabled a unit test temporarily due to reasons described in https://github.com/facebook/rocksdb/issues/9151
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9142
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D32299956
      
      Pulled By: riversand963
      
      fbshipit-source-id: 0da75888177d91905cf8c9d00605b73afb5970a7
      1e8322c0
  6. 19 11月, 2021 6 次提交
    • A
      Adhere to per-DB concurrency limit when bottom-pri compactions exist (#9179) · 8cf4294e
      Andrew Kryczka 提交于
      Summary:
      - Fixed bug where bottom-pri manual compactions were counting towards `bg_compaction_scheduled_` instead of `bg_bottom_compaction_scheduled_`. It seems to have no negative effect.
      - Fixed bug where automatic compaction scheduling did not consider `bg_bottom_compaction_scheduled_`. Now automatic compactions cannot be scheduled that exceed the per-DB compaction concurrency limit (`max_compactions`) when some existing compactions are bottommost.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9179
      
      Test Plan: new unit test for manual/automatic. Also verified the existing automatic/automatic test ("ConcurrentBottomPriLowPriCompactions") hanged until changing it to explicitly enable concurrency.
      
      Reviewed By: riversand963
      
      Differential Revision: D32488048
      
      Pulled By: ajkr
      
      fbshipit-source-id: 20c4c0693678e81e43f85ed3cc3402fcf26e3310
      8cf4294e
    • A
      Add listener API that notifies on IOError (#9177) · 4a7c1dc3
      Akanksha Mahajan 提交于
      Summary:
      Add a new API in listener.h that notifies about IOErrors on
      Read/Write/Append/Flush etc. The API reports about IOStatus, filename, Operation
      name, offset and length.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9177
      
      Test Plan: Added new unit tests
      
      Reviewed By: anand1976
      
      Differential Revision: D32470627
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 189a717033590ae227b3beae8b1e7e185e4cdc12
      4a7c1dc3
    • A
      Check that newIteratorWithBase regardless of WBWI Overwrite Mode (#8134) · d9493232
      Adam Retter 提交于
      Summary:
      The behaviour of WBWI has changed when calling newIteratorWithBase when overwrite is set to true or false. This PR simply adds tests to assert the new correct behaviour.
      
      Closes https://github.com/facebook/rocksdb/issues/7370
      Closes https://github.com/facebook/rocksdb/pull/8134
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9107
      
      Reviewed By: pdillinger
      
      Differential Revision: D32099475
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 245f483f73db866cc8a51219a2bff2e09e59faa0
      d9493232
    • P
      Improve / clean up meta block code & integrity (#9163) · 230660be
      Peter Dillinger 提交于
      Summary:
      * Checksums are now checked on meta blocks unless specifically
      suppressed or not applicable (e.g. plain table). (Was other way around.)
      This means a number of cases that were not checking checksums now are,
      including direct read TableProperties in Version::GetTableProperties
      (fixed in meta_blocks ReadTableProperties), reading any block from
      PersistentCache (fixed in BlockFetcher), read TableProperties in
      SstFileDumper (ldb/sst_dump/BackupEngine) before table reader open,
      maybe more.
      * For that to work, I moved the global_seqno+TableProperties checksum
      logic to the shared table/ code, because that is used by many utilies
      such as SstFileDumper.
      * Also for that to work, we have to know when we're dealing with a block
      that has a checksum (trailer), so added that capability to Footer based
      on magic number, and from there BlockFetcher.
      * Knowledge of trailer presence has also fixed a problem where other
      table formats were reading blocks including bytes for a non-existant
      trailer--and awkwardly kind-of not using them, e.g. no shared code
      checking checksums. (BlockFetcher compression type was populated
      incorrectly.) Now we only read what is needed.
      * Minimized code duplication and differing/incompatible/awkward
      abstractions in meta_blocks.{cc,h} (e.g. SeekTo in metaindex block
      without parsing block handle)
      * Moved some meta block handling code from table_properties*.*
      * Moved some code specific to block-based table from shared table/ code
      to BlockBasedTable class. The checksum stuff means we can't completely
      separate it, but things that don't need to be in shared table/ code
      should not be.
      * Use unique_ptr rather than raw ptr in more places. (Note: you can
      std::move from unique_ptr to shared_ptr.)
      
      Without enhancements to GetPropertiesOfAllTablesTest (see below),
      net reduction of roughly 100 lines of code.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9163
      
      Test Plan:
      existing tests and
      * Enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to verify that
      checksums are now checked on direct read of table properties by TableCache
      (new test would fail before this change)
      * Also enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to test
      putting table properties under old meta name
      * Also generally enhanced that same test to actually test what it was
      supposed to be testing already, by kicking things out of table cache when
      we don't want them there.
      
      Reviewed By: ajkr, mrambacher
      
      Differential Revision: D32514757
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 507964b9311d186ae8d1131182290cbd97a99fa9
      230660be
    • Z
      Fix the analyzer test failure caused by inaccurate timing wait (#9181) · f4294669
      Zhichao Cao 提交于
      Summary:
      Fix the analyzer test failure caused by inaccurate timing wait. The wait time at different system might be different or cause the delay, now we do not accurately count the lines. Only in a very rare extreme case, test will ignore the part exceed the timing of 1 second.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9181
      
      Test Plan: make check
      
      Reviewed By: pdillinger
      
      Differential Revision: D32511319
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: e694c8cb465c750cfa5a43dab3eff6707b9a11c8
      f4294669
    • H
      Account Bloom/Ribbon filter construction memory in global memory limit (#9073) · 74544d58
      Hui Xiao 提交于
      Summary:
      Note: This PR is the 4th part of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073) and will rebase/merge only after the first three PRs (https://github.com/facebook/rocksdb/pull/9070, https://github.com/facebook/rocksdb/pull/9071, https://github.com/facebook/rocksdb/pull/9130) merge.
      
      **Context:**
      Similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`.
      
      The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](https://github.com/facebook/rocksdb/pull/9073#discussion_r741548409) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.
      
      **Summary:**
      - Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction:
         - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance),
         - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`),
         - final filter (i.e, `mutable_buf`'s size).
            - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as  explicitly delete the filter bits builder when done with the final filter in block based table.
      - Added option fo run `filter_bench` with this memory reservation feature
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9073
      
      Test Plan:
      - Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of  `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory`
        - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally
      - Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter
      - Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below:
         - FastLocalBloom
            - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`:
               - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0)
            - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`:
               - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0)
            - new feature of RibbonFilter with fallback  (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
               - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0)
      
          - Ribbon128
             - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`:
                 - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0)
             - new feature  (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `:
                 - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0)
             - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
                - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0)
                - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console.
      
      Reviewed By: pdillinger
      
      Differential Revision: D31991348
      
      Pulled By: hx235
      
      fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
      74544d58
  7. 18 11月, 2021 2 次提交
    • P
      Don't allow parallel crash_test in Makefile (#9180) · 4f678b52
      Peter Dillinger 提交于
      Summary:
      Using deps for running blackbox and whitebox allows them to be
      parallelized, which doesn't seem to be working well.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9180
      
      Test Plan: make -j24 crash_test
      
      Reviewed By: siying
      
      Differential Revision: D32500851
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 364288c8d023b93e7ca2724ea40edae2f4eb0407
      4f678b52
    • D
      Fix integer overflow in TraceOptions (#9157) · c9539ede
      Davide Angelocola 提交于
      Summary:
      Hello from a happy user of rocksdb java :-)
      
      Default constructor of TraceOptions is supposed to initialize size to 64GB but the expression contains an integer overflow.
      
      Simple test case with JShell:
      ```
      jshell> 64 * 1024 * 1024 * 1024
      $1 ==> 0
      
      jshell> 64L * 1024 * 1024 * 1024
      $2 ==> 68719476736
      ```
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9157
      
      Reviewed By: pdillinger, zhichao-cao
      
      Differential Revision: D32369273
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 6a0c95fff7a91f27ff15d65b662c6b101756b450
      c9539ede
  8. 17 11月, 2021 8 次提交
  9. 16 11月, 2021 1 次提交
    • Y
      Update TransactionUtil::CheckKeyForConflict to also use timestamps (#9162) · 20357988
      Yanqin Jin 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9162
      
      Existing TransactionUtil::CheckKeyForConflict() performs only seq-based
      conflict checking. If user-defined timestamp is enabled, it should perform
      conflict checking based on timestamps too.
      
      Update TransactionUtil::CheckKey-related methods to verify the timestamp of the
      latest version of a key is smaller than the read timestamp. Note that
      CheckKeysForConflict() is not updated since it's used only by optimistic
      transaction, and we do not plan to update it in this upcoming batch of diffs.
      
      Existing GetLatestSequenceForKey() returns the sequence of the latest
      version of a specific user key. Since we support user-defined timestamp, we
      need to update this method to also return the timestamp (if enabled) of the
      latest version of the key. This will be needed for snapshot validation.
      
      Reviewed By: ltamasi
      
      Differential Revision: D31567960
      
      fbshipit-source-id: 2e4a14aed267435a9aa91bc632d2411c01946d44
      20357988
  10. 13 11月, 2021 2 次提交
    • A
      Use system-wide thread ID in info log lines (#9164) · 9bb13c56
      Andrew Kryczka 提交于
      Summary:
      This makes it easier to debug with tools like `ps`. The change only
      applies to builds with glibc 2.30+ and _GNU_SOURCE extensions enabled.
      We could adopt it in more cases by using the syscall but this is enough
      for our build.
      
      Replaces https://github.com/facebook/rocksdb/issues/2973.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9164
      
      Test Plan:
      - ran some benchmarks and correlated logged thread IDs with those shown by `ps -L`.
      - verified no noticeable regression in throughput for log heavy (more than 700k log lines and over 5k / second) scenario.
      
      Benchmark command:
      
      ```
      $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom -compression_type=none -max_bytes_for_level_multiplier=2 -write_buffer_size=262144 -num_levels=7 -max_bytes_for_level_base=2097152 -target_file_size_base=524288 -level_compaction_dynamic_level_bytes=true -max_background_jobs=12 -num=20000000
      ```
      
      Results before: 15.9MB/s, 15.8MB/s, 16.0MB/s
      Results after: 16.3MB/s, 16.3MB/s, 15.8MB/s
      
      - Rely on CI to test the fallback behavior
      
      Reviewed By: riversand963
      
      Differential Revision: D32399660
      
      Pulled By: ajkr
      
      fbshipit-source-id: c24d44fdf7782faa616ef0a0964eaca3539d9c24
      9bb13c56
    • A
      Clarify max_write_buffer_size_to_maintain (#9154) · 3295e9f6
      Andrew Kryczka 提交于
      Summary:
      I was unable to figure out the behavior by reading the old doc so attempted to
      write it differently.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9154
      
      Reviewed By: riversand963
      
      Differential Revision: D32338843
      
      Pulled By: ajkr
      
      fbshipit-source-id: e1e67720cd92572b195583e5ea2c592180d4fefd
      3295e9f6
  11. 12 11月, 2021 1 次提交
  12. 11 11月, 2021 2 次提交
    • A
      Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056) · 17ce1ca4
      Akanksha Mahajan 提交于
      Summary:
      RocksDB does auto-readahead for iterators on noticing more than two sequential reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read up to max_auto_readahead_size. However at each level, if iterator moves over next file, readahead_size starts again from 8KB.
      
      This PR introduces a new ReadOption "adaptive_readahead" which when set true will maintain readahead_size  at each level. So when iterator moves from one file to another, new file's readahead_size will continue from previous file's readahead_size instead of scratch. However if reads are not sequential it will fall back to 8KB (default) with no prefetching for that block.
      
      1. If block is found in cache but it was eligible for prefetch (block wasn't in Rocksdb's prefetch buffer),  readahead_size will decrease by 8KB.
      2. It maintains readahead_size for L1 - Ln levels.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9056
      
      Test Plan:
      Added new unit tests
      Ran db_bench for "readseq, seekrandom, seekrandomwhilewriting, readrandom" with --adaptive_readahead=true and there was no regression if new feature is enabled.
      
      Reviewed By: anand1976
      
      Differential Revision: D31773640
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: 7332d16258b846ae5cea773009195a5af58f8f98
      17ce1ca4
    • I
      Fix typo in env_win.h (#9138) · afcd3253
      Ikko Ashimine 提交于
      Summary:
      overide -> override
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9138
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D32245235
      
      Pulled By: mrambacher
      
      fbshipit-source-id: bed62b843925bed806c06ca3485d33bb45a56dc7
      afcd3253