1. 23 10月, 2021 1 次提交
  2. 21 10月, 2021 6 次提交
    • Y
      Fix atomic flush waiting forever for MANIFEST write (#9034) · f72fd585
      Yanqin Jin 提交于
      Summary:
      In atomic flush, concurrent background flush threads will commit to the MANIFEST
      one by one, in the order of the IDs of their picked memtables for all included column
      families. Each time, a background flush thread decides whether to wait based on two
      criteria:
      - Is db stopped? If so, don't wait.
      - Am I the one to commit the currently earliest memtable? If so, don't wait and ready to go.
      
      When atomic flush was implemented, error writing to or syncing the MANIFEST would
      cause the db to be stopped. Therefore, this background thread does not have to check
      for the background error while waiting. If there has been such an error, `DBStopped()`
      would have been true, and this thread will **not** wait forever.
      
      After we improved error handling, RocksDB may map an IOError while writing to MANIFEST
      to a soft error, if there is no WAL. This requires the background threads to check for
      background error while waiting. Otherwise, a background flush thread may wait forever.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9034
      
      Test Plan: make check
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31639225
      
      Pulled By: riversand963
      
      fbshipit-source-id: e9ab07c4d8f2eade238adeefe3e42dd9a5a3ebbd
      f72fd585
    • S
      Update Release Version to 6.26 (#9059) · 633f069c
      sdong 提交于
      Summary:
      Before cutting release branch 6.26, update version.h and release notes
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9059
      
      Reviewed By: ajkr
      
      Differential Revision: D31805126
      
      fbshipit-source-id: ae85ccf06ec756fa21163161f53fd0b728e6e32e
      633f069c
    • L
      remove unused local obj and simpilify comple code (#9052) · 0a73ada7
      leipeng 提交于
      Summary:
      This PR does not change code sematics, it just changes for:
      
      1. local obj `nonmem_w` and `lfile` are unused
      2. null check for `delete ptr` is unnecessary
      3. use `unique_ptr::reset` instead of `release` + `delete`
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9052
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31801661
      
      Pulled By: anand1976
      
      fbshipit-source-id: 16a77d45da8c8833bf5bf3bce546bb3711b335df
      0a73ada7
    • L
      db_impl_write.cc: use stats_ instead of immutable_db_options_.stats (#9053) · 0c53b418
      leipeng 提交于
      Summary:
      This PR has no semantic changes, just to make code shorter.
      
      `stats_` has value same with `immutable_db_options_.stats`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9053
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31801603
      
      Pulled By: anand1976
      
      fbshipit-source-id: cbd8fe478d3e90ae078ace49b4f2eb9bb028ccf6
      0c53b418
    • A
      Support `GetMapProperty()` with "rocksdb.dbstats" (#9057) · 4217d1bc
      Andrew Kryczka 提交于
      Summary:
      This PR supports querying `GetMapProperty()` with "rocksdb.dbstats" to get the DB-level stats in a map format. It only reports cumulative stats over the DB lifetime and, as such, does not update the baseline for interval stats. Like other map properties, the string keys are not (yet) exposed in the public API.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9057
      
      Test Plan: new unit test
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31781495
      
      Pulled By: ajkr
      
      fbshipit-source-id: 6f77d3aee8b4b1a015061b8c260a123859ceaf9b
      4217d1bc
    • S
      Incremental Space Amp Compactions in Universal Style (#8655) · c66b4429
      sdong 提交于
      Summary:
      This commit introduces incremental compaction in univeral style for space amplification. This follows the first improvement mentioned in https://rocksdb.org/blog/2021/04/12/universal-improvements.html . The implemention simply picks up files about size of max_compaction_bytes to compact and execute if the penalty is not too big. More optimizations can be done in the future, e.g. prioritizing between this compaction and other types. But for now, the feature is supposed to be functional and can often reduce frequency of full compactions, although it can introduce penalty.
      
      In order to add cut files more efficiently so that more files from upper levels can be included, SST file cutting threshold (for current file + overlapping parent level files) is set to 1.5X of target file size. A 2MB target file size will generate files like this: https://gist.github.com/siying/29d2676fba417404f3c95e6c013c7de8 Number of files indeed increases but it is not out of control.
      
      Two set of write benchmarks are run:
      1. For ingestion rate limited scenario, we can see full compaction is mostly eliminated: https://gist.github.com/siying/959bc1186066906831cf4c808d6e0a19 . The write amp increased from 7.7 to 9.4, as expected. After applying file cutting, the number is improved to 8.9. In another benchmark, the write amp is even better with the incremental approach: https://gist.github.com/siying/d1c16c286d7c59c4d7bba718ca198163
      2. For ingestion rate unlimited scenario, incremental compaction turns out to be too expensive most of the time and is not executed, as expected.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8655
      
      Test Plan: Add unit tests to the functionality.
      
      Reviewed By: ajkr
      
      Differential Revision: D31787034
      
      fbshipit-source-id: ce813e63b15a61d5a56e97bf8902a1b28e011beb
      c66b4429
  3. 20 10月, 2021 7 次提交
    • Z
      Add lowest_used_cache_tier to ImmutableDBOptions to enable or disable Secondary Cache (#9050) · 6d93b875
      Zhichao Cao 提交于
      Summary:
      Currently, if Secondary Cache is provided to the lru cache, it is used by default. We add CacheTier to advanced_options.h to describe the cache tier we used. Add a `lowest_used_cache_tier` option to `DBOptions` (immutable) and pass it to BlockBasedTableReader to decide if secondary cache will be used or not. By default it is `CacheTier::kNonVolatileTier`, which means, we always use both block cache (kVolatileTier) and secondary cache (kNonVolatileTier). By set it to `CacheTier::kVolatileTier`, the DB will not use the secondary cache.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9050
      
      Test Plan: added new tests
      
      Reviewed By: anand1976
      
      Differential Revision: D31744769
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: a0575ebd23e1c6dfcfc2b4c8578764e73b15bce6
      6d93b875
    • J
      Add "Java API Changes" session in HISTORY (#9055) · f20b07ce
      Jay Zhuang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9055
      
      Reviewed By: ajkr
      
      Differential Revision: D31765398
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 77ed67d69415c9fbbfc1132b15310b293e3939c6
      f20b07ce
    • S
      Ignore non-overlapping levels when determinig grandparent files (#9051) · f053851a
      sdong 提交于
      Summary:
      Right now, when picking a compaction, grand parent files are from output_level + 1. This usually works, but if the level doesn't have any overlapping file, it will be more efficient to go further down. This is because the files are likely to be trivial moved further and might create a violation of max_compaction_bytes. This situation can naturally happen and might happen even more with TTL compactions. There is no harm to fix it.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9051
      
      Test Plan: Run existing tests and see it passes. Also briefly run crash test.
      
      Reviewed By: ajkr
      
      Differential Revision: D31748829
      
      fbshipit-source-id: 52b99ab4284dc816d22f34406d528a3c98ff6719
      f053851a
    • P
      Improve data block construction performance (#9040) · b234a3f5
      Peter Dillinger 提交于
      Summary:
      ... by bypassing tracking of last_key in BlockBuilder when
      last_key is already known (for BlockBasedTableBuilder::data_block).
      
      I tried extracting a base class of BlockBuilder without the last_key
      tracking at all, but that became complicated by NewFlushBlockPolicy() in
      the public API referencing BlockBuilder, which would need to be the base
      class, and I don't want to replace nearly all the internal references to
      BlockBuilder.
      
      Possible follow-up:
      * Investigate / consider using AddWithLastKey in more places
      
      This improvement should stack with https://github.com/facebook/rocksdb/issues/9039
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9040
      
      Test Plan:
      TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000
      Compiled with DEBUG_LEVEL=0
      Test vs. control runs simulaneous for better accuracy, units = ops/sec
      
      Run 1: 278929 vs. 267799 (+4.2%)
      Run 2: 281836 vs. 267432 (+5.4%)
      Run 3: 278279 vs. 270454 (+2.9%)
      
      (This benchmark is chosen to have detectable signal-to-noise, not to
      represent expected improvement percent on real workloads.)
      
      Reviewed By: mrambacher
      
      Differential Revision: D31706033
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 8a50fe6fefdd67b6d7665ffa687bbdcf5ad0d5ec
      b234a3f5
    • P
      Fix stress/crash test handling of SST unique IDs (#9054) · 0534393f
      Peter Dillinger 提交于
      Summary:
      Was not handling the case of OnTableFileCreated invoked for
      table file NOT created.
      
      Also improved error reporting and caught a missing status check.
      
      Also strengthened the db_stress listener to require file_size > 0 when
      status.ok(). We would be violating the API contract if status is OK and
      we didn't create a valid SST file.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9054
      
      Test Plan: make blackbox_crash_test for a while
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31765200
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7c527f5531bc239a5efd7a7b018545d480f926e2
      0534393f
    • M
      Allow unregistered options to be ignored in DBOptions from files (#9045) · 8fb3fe8d
      mrambacher 提交于
      Summary:
      Adds changes to DBOptions (comparable to ColumnFamilyOptions) to allow some option values to be ignored on rehydration from the Options file.  This is necessary for some customizable classes that were not registered with the ObjectRegistry but are saved/restored from the Options file.
      
      All tests pass.  Will run check_format_compatible.sh shortly.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9045
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31761664
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 300c2251639cce2b223481c3bb2a63877b1f3766
      8fb3fe8d
    • A
      New-style blob option bindings, Java option getter and improve/fix option parsing (#8999) · 8d615a2b
      Alan Paxton 提交于
      Summary:
      Implementation of https://github.com/facebook/rocksdb/issues/8221, plus/including extension of Java options API to allow the get() of options from RocksDB. The extension allows more comprehensive testing of options at the Java side, by validating that the options are set at the C++ side.
      
      Variations on methods:
      MutableColumnFamilyOptions.MutableColumnFamilyOptionsBuilder getOptions()
      MutableDBOptions.MutableDBOptionsBuilder getDBOptions()
      
      retrieve the options via RocksDB C++ interfaces, and parse the resulting string into one of the Java-style option objects.
      
      This necessitated generalising the parsing of option strings in Java, which now parses the full range of option strings returned by the C++ interface, rather than a useful subset. This necessitates the list-separator being changed to :(colon) from , (comma).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8999
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D31655487
      
      Pulled By: ltamasi
      
      fbshipit-source-id: c38e98145c81c61dc38238b0df580db176ce4efd
      8d615a2b
  4. 19 10月, 2021 8 次提交
    • P
      Experimental support for SST unique IDs (#8990) · ad5325a7
      Peter Dillinger 提交于
      Summary:
      * New public header unique_id.h and function GetUniqueIdFromTableProperties
      which computes a universally unique identifier based on table properties
      of table files from recent RocksDB versions.
      * Generation of DB session IDs is refactored so that they are
      guaranteed unique in the lifetime of a process running RocksDB.
      (SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
      this enables SST unique IDs to be guaranteed unique among SSTs generated
      in a single process, and "better than random" between processes.
      See https://github.com/pdillinger/unique_id
      * In addition to public API producing 'external' unique IDs, there is a function
      for producing 'internal' unique IDs, with functions for converting between the
      two. In short, the external ID is "safe" for things people might do with it, and
      the internal ID enables more "power user" features for the future. Specifically,
      the external ID goes through a hashing layer so that any subset of bits in the
      external ID can be used as a hash of the full ID, while also preserving
      uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
      and on full 192 bits).
      
      Intended follow-up:
      * Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
      the third 64-bit value of the unique ID.)
      * Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
      
      Test Plan:
      Unit tests added, and checking of unique ids in stress test.
      NOTE in stress test we do not generate nearly enough files to thoroughly
      stress uniqueness, but the test trims off pieces of the ID to check for
      uniqueness so that we can infer (with some assumptions) stronger
      properties in the aggregate.
      
      Reviewed By: zhichao-cao, mrambacher
      
      Differential Revision: D31582865
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
      ad5325a7
    • A
      Add property_bag to FileOptions (#9030) · aa218968
      anand76 提交于
      Summary:
      Add a property_bag option in FileOptions for direct FileSystem users to pass custom properties to the provider in APIs such as NewRandomAccessFile, NewWritableFile etc. This field will be ignored/not populated by RocksDB.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9030
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31630643
      
      Pulled By: anand1976
      
      fbshipit-source-id: 1e1ddc5e2933ecada99a94eada5f309b674a03e8
      aa218968
    • G
      Fix out-of-bounds access in MultiDBParallelOpenTest (#9046) · f0841d4f
      Giuseppe Ottaviano 提交于
      Summary:
      `dbs` should not be cleared, as it is reused later when reopening the DBs, so we have an out-of-bounds access with `dbnames[dbnum]`. The values left in the vector don't need to be reset, as the db pointer is an out parameter for `DB::Open`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9046
      
      Reviewed By: pdillinger
      
      Differential Revision: D31738263
      
      Pulled By: ot
      
      fbshipit-source-id: c619e947b8d3dbc3d896f29971f093d3e3c794d3
      f0841d4f
    • J
      Make `DB::Close()` thread-safe (#8970) · 314de7e7
      Jay Zhuang 提交于
      Summary:
      If `DB::Close()` is called in multi-thread env, the resource
      could be double released, which causes exception or assert.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8970
      
      Test Plan:
      Test with multi-thread benchmark, with each thread try to
      close the DB at the end.
      
      Reviewed By: pdillinger
      
      Differential Revision: D31242042
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: a61276b1b61e07732e375554106946aea86a23eb
      314de7e7
    • A
      keyMayExist() supports ByteBuffer (#9013) · 86cf7266
      Alan Paxton 提交于
      Summary:
      closes https://github.com/facebook/rocksdb/issues/7917
      
      Implemented ByteBuffer API variants of Java keyMayExist() uniformly with and without column families, read options and return data values. Implemented 2 supporting C++ JNI methods.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9013
      
      Reviewed By: mrambacher
      
      Differential Revision: D31665989
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 8adc1730217dba38d6fa7b31d788650a33e28af1
      86cf7266
    • J
      Deflaky ObsoleteFilesTest (#9049) · 53a0ab2b
      Jay Zhuang 提交于
      Summary:
      WaitForFlushMemTable() may only wait for mem flush but not background flush
      finishing. The the obsoleted file may not be purged yet.
      https://github.com/facebook/rocksdb/blob/fcaa7ff6381fe6052b37a1d013b14960ea23ac17/db/db_impl/db_impl_compaction_flush.cc#L2200-L2203
      
      Use WaitForCompact() instead to wait for background flush job.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9049
      
      Test Plan: `gtest-parallel ./obsolete_files_test --gtest_filter=ObsoleteFilesTest.DeleteObsoleteOptionsFile -r 1000`
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31737343
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 82276ebeae7c7c75a733d3e1fd1c130d45e4761f
      53a0ab2b
    • J
      Fix gcc-11 compile error (#9043) · b4326b52
      Jay Zhuang 提交于
      Summary:
      gcc11 added new static check.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9043
      
      Test Plan: Added CI for gcc11 build
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31716005
      
      Pulled By: jay-zhuang
      
      fbshipit-source-id: 9f53be6f2f9e58e39b83359f6bbf66f945d57429
      b4326b52
    • P
      Fix COMMIT_ID in regression_test.sh (#9047) · 908a999a
      Peter Dillinger 提交于
      Summary:
      https://github.com/facebook/rocksdb/issues/8031 broke internal tests. This should fix but also preserve
      the intended capability of getting git commit id when hg not used
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9047
      
      Test Plan: already broken ¯\\_(ツ)_/¯
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31732198
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7dba8531ddca55a6de5e04978a1a1601aae4cee9
      908a999a
  5. 18 10月, 2021 1 次提交
    • P
      Two performance improvements in BlockBuilder (#9039) · 9d66d6d1
      Peter Dillinger 提交于
      Summary:
      Primarily, this change reserves space in the std::string for building
      the next block once a block is finished, using `block_size` as
      reservation size. Note: also tried reusing same std::string in the
      common "unbuffered" path but that showed no benefit or regression.
      
      Secondarily, this slightly reduces the work in resetting `restarts_`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9039
      
      Test Plan:
      TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000
      Compiled with DEBUG_LEVEL=0
      Test vs. control runs simulaneous for better accuracy, units = ops/sec
      
      Run 1, Primary change only: 292697 vs. 280267 (+4.4%)
      Run 2, Primary change only: 288763 vs. 279621 (+3.3%)
      Run 1, Secondary change only: 260065 vs. 254232 (+2.3%)
      Run 2, Secondary change only: 275925 vs. 272248 (+1.4%)
      Run 1, Both changes: 284890 vs. 270372 (+5.3%)
      Run 2, Both changes: 263511 vs. 258188 (+2.0%)
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D31701253
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 7e40810afbb98e6b6446955e77bda59e69b19ffd
      9d66d6d1
  6. 17 10月, 2021 1 次提交
    • P
      Add (Live)FileStorageInfo API (#8968) · 3ffb3baa
      Peter Dillinger 提交于
      Summary:
      New classes FileStorageInfo and LiveFileStorageInfo and
      'experimental' function DB::GetLiveFilesStorageInfo, which is intended
      to largely replace several fragmented DB functions needed to create
      checkpoints and backups.
      
      This function is now used to create checkpoints and backups, because
      it fixes many (probably not all) of the prior complexities of checkpoint
      not having atomic access to DB metadata. This also ensures strong
      functional test coverage of the new API. Specifically, much of the old
      CheckpointImpl::CreateCustomCheckpoint has been migrated to and
      updated in DBImpl::GetLiveFilesStorageInfo, with the former now
      calling the latter.
      
      Also, the class FileStorageInfo in metadata.h compatibly replaces
      BackupFileInfo and serves as a new base class for SstFileMetaData.
      Some old fields of SstFileMetaData are still provided (for now) but
      deprecated.
      
      Although FileStorageInfo::directory is accurate when using db_paths
      and/or cf_paths, these have never been supported by Checkpoint
      nor BackupEngine and still are not. This change does now detect
      these cases and return NotSupported when appropriate. (More work
      needed for support.)
      
      Somehow this change broke ProgressCallbackDuringBackup, but
      the progress_callback logic was dubious to begin with because it
      would call the callback based on copy buffer size, not size actually
      copied. Logic and test updated to track size actually copied
      per-thread.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8968
      
      Test Plan:
      tests updated.
      DB::GetLiveFilesStorageInfo mostly tested by use in CheckpointImpl.
      DBTest.SnapshotFiles updated to also test GetLiveFilesStorageInfo,
      including reading the data after DB close.
      Added CheckpointTest.CheckpointWithDbPath (NotSupported).
      
      Reviewed By: siying
      
      Differential Revision: D31242045
      
      Pulled By: pdillinger
      
      fbshipit-source-id: b183d1ce9799e220daaefd6b3b5365d98de676c0
      3ffb3baa
  7. 16 10月, 2021 1 次提交
    • M
      SyncPoint::Process thrashes heap ... fix it (#9023) · 678ba5e4
      matthewvon 提交于
      Summary:
      The first parameter of SyncPoint::Process is "const std::string&".  The majority, maybe all, of the actual calls to this function use a "const char *".  The conversion before entering the function requires a construction of a std::string object on the heap.  This std::object is then typically not needed because first use of the string is a rocksdb::Slice which has a less costly conversion of char * to slice.
      
      Example:
      
      We have a load and iterate test.  The test loads 10m keys and iterates most via 10 rocksdb::Iterator objects.  We used TCMALLOC to gather information about allocation and space usage during iterators.
      
      - Before this PR:  test took 32 min 17 sec
      - After this PR:  test took 1 min 14 sec
      
      The TCMALLOC top object list before this PR:
      
      <pre>
      Total: 5105999 objects
       5003717  98.0%  98.0%  5009471  98.1% rocksdb::DBIter::MergeValuesNewToOld (inline)
         20260   0.4%  98.4%    20260   0.4% std::__cxx11::basic_string::_M_mutate
         15214   0.3%  98.7%    15214   0.3% rocksdb::UncompressBlockContentsForCompressionType (inline)
         13408   0.3%  99.0%    13408   0.3% std::_Rb_tree::_M_emplace_hint_unique [clone .constprop.416] (inline)
         12957   0.3%  99.2%    12957   0.3% std::_Rb_tree::_M_emplace_hint_unique [clone .constprop.405] (inline)
          9327   0.2%  99.4%     9327   0.2% std::_Rb_tree::_M_copy (inline)
          7691   0.2%  99.5%     9919   0.2% JVM_FindSignal
          2859   0.1%  99.6%     2859   0.1% rocksdb::Cleanable::RegisterCleanup
          2844   0.1%  99.7%     2844   0.1% std::map::operator[] (inline)
      </pre>
      
      The "MergeValuesNewToOld (inline)" objects are the #define wrappers to SyncPoint::Process.  We discovered this in a 5.18 rocksdb release.  There TCMALLOC was more specific that std::basic_string was being constructed.  I believe that was before SyncPoint::Process was declared inline in subsequent releases.
      
      The TCMALLOC top object list after this PR:
      
      <pre>
      Total: 104911 objects
         45090  43.0%  43.0%    45090  43.0% rocksdb::Cleanable::RegisterCleanup
         29995  28.6%  71.6%    29995  28.6% rocksdb::LRUCacheShard::Insert
         15229  14.5%  86.1%    15229  14.5% rocksdb::UncompressBlockContentsForCompressionType (inline)
          4373   4.2%  90.3%     4551   4.3% JVM_FindSignal
          2881   2.7%  93.0%     2881   2.7% rocksdb::::ReadBlockFromFile (inline)
          1162   1.1%  94.1%     1176   1.1% rocksdb::BlockFetcher::ReadBlockContents (inline)
          1036   1.0%  95.1%     1036   1.0% std::__cxx11::basic_string::_M_mutate
           869   0.8%  95.9%      869   0.8% std::vector::_M_realloc_insert (inline)
           806   0.8%  96.7%      806   0.8% SnmpAgent::GetVariables (inline)
      </pre>
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9023
      
      Reviewed By: pdillinger
      
      Differential Revision: D31610907
      
      Pulled By: mrambacher
      
      fbshipit-source-id: 574ff51b639dd46ad253a8e664a575f06b7cc85d
      678ba5e4
  8. 15 10月, 2021 9 次提交
  9. 14 10月, 2021 1 次提交
  10. 13 10月, 2021 3 次提交
    • G
      Fix sequence number bump logic in multi-CF SST ingestion (#9005) · 4bfd415e
      Giuseppe Ottaviano 提交于
      Summary:
      The code in `IngestExternalFiles()` that bumps the DB's sequence number
      depending on what seqnos were assigned to the files has 3 bugs:
      
      1) There is an assertion that the sequence number is increased in all the
      affected column families, but this is unnecessary, it is fine if some files can
      stick to a lower sequence number. It is very easy to hit the assertion: it is
      sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
      doesn't (for example the CF is empty). The line added in the
      `IngestFilesIntoMultipleColumnFamilies_Success` test makes the assertion fail.
      
      2) SetLastSequence() is called with the sum of all the bumps across CFs, but we
      should take the maximum instead, as all CFs start with the current seqno and bump
      it independently.
      
      3) The code above is accidentally under a `#ifndef NDEBUG`, so it doesn't run in
      optimized builds, so some files may be assigned seqnos from the future.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9005
      
      Test Plan:
      Added line in `IngestFilesIntoMultipleColumnFamilies_Success` that
      triggers the assertion, verified that the test (and all the others) pass after
      the fix.
      
      Reviewed By: ajkr
      
      Differential Revision: D31597892
      
      Pulled By: ot
      
      fbshipit-source-id: c2d3237f90290df1178736ace8653a9623f5a770
      4bfd415e
    • L
      Add a benchmarking wrapper script for BlobDB (#9015) · b4e59a48
      Levi Tamasi 提交于
      Summary:
      The patch adds a new BlobDB benchmarking script called `run_blob_bench.sh`.
      It is a thin wrapper around `benchmark.sh` (similarly to `run_flash_bench.sh`):
      it actually calls `benchmark.sh` a number of times, cycling through six workloads,
      two write-only ones (bulk load and overwrite), two read/write ones (point lookups
      while writing, range scans while writing), and two read-only ones (point lookups
      and range scans).
      
      Note: this is a simpler/cleaned up/reworked version of the script used to produce the
      benchmark results in http://rocksdb.org/blog/2021/05/26/integrated-blob-db.html .
      The new version takes advantage of several recent `benchmark.sh` improvements
      like the ability to pass in arbitrary `db_bench` options or the possibility of using a
      job ID.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9015
      
      Test Plan: Ran the script manually with different parameter combinations.
      
      Reviewed By: riversand963
      
      Differential Revision: D31555277
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 0e151b2f7b2cf6f66ed7f95455571492ad7ea87f
      b4e59a48
    • L
      Update HISTORY for PR 8994 (#9017) · 7cc52cd8
      Levi Tamasi 提交于
      Summary:
      Also, expand on/clarify a comment in `VersionStorageInfoTest`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9017
      
      Reviewed By: riversand963
      
      Differential Revision: D31566130
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 1d30c7af084c4de7b2030bc6c768838d65746010
      7cc52cd8
  11. 12 10月, 2021 2 次提交
    • G
      Fix race in WriteBufferManager (#9009) · 22d4dc50
      Giuseppe Ottaviano 提交于
      Summary:
      EndWriteStall has a data race: `queue_.empty()` is checked outside of the
      mutex, so once we enter the critical section another thread may already have
      cleared the list, and accessing the `front()` is undefined behavior (and causes
      interesting crashes under high concurrency).
      
      This PR fixes the bug, and also rewrites the logic to make it easier to reason
      about it. It also fixes another subtle bug: if some writers are stalled and
      `SetBufferSize(0)` is called, which disables the WBM, the writer are not
      unblocked because of an early `enabled()` check in `EndWriteStall()`.
      
      It doesn't significantly change the locking behavior, as before writers won't
      lock unless entering a stall condition, and `FreeMem` almost always locks if
      stalling is allowed, but that is inevitable with the current design. Liveness is
      guaranteed by the fact that if some writes are blocked, eventually all writes
      will be blocked due to `stall_active_`, and eventually all memory is freed.
      
      While at it, do a couple of optimizations:
      
      - In `WBMStallInterface::Signal()` signal the CV only after releasing the
        lock. Signaling under the lock is a common pitfall, as it causes the woken-up
        thread to immediately go back to sleep because the mutex is still locked by
        the awaker.
      
      - Move all allocations and deallocations outside of the lock.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
      
      Test Plan:
      ```
      USE_CLANG=1 make -j64 all check
      ```
      
      Reviewed By: akankshamahajan15
      
      Differential Revision: D31550668
      
      Pulled By: ot
      
      fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
      22d4dc50
    • Y
      Inline an empty destructor (#9004) · e1139167
      Yanqin Jin 提交于
      Summary:
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/9004
      
      Inline an empty destructor
      
      Reviewed By: ltamasi
      
      Differential Revision: D31525561
      
      fbshipit-source-id: 3b9e37f06b0c70529a5d2d660de21ea335c73611
      e1139167