提交 · e970248602f9efd193a5d2d65427003766dd56d0 · kvdb / rocksdb

23 10月, 2021 1 次提交

Add support for building on s390x platform (#8962) · e9702486

由 Jonathan Albrecht 提交于 10月 22, 2021

Summary:
This PR adds support for building on s390x including updating travis CI. It uses the previous work in https://github.com/facebook/rocksdb/pull/6168 and adds some more changes to get all current tests (make check and jni tests) to pass. The tests were run with snappy, lz4, bzip2 and zstd all compiled in.

There are a few pieces still needed to get the travis build working that I don't think I can do. adamretter is this something you could help with?

1. A prebuilt https://rocksdb-deps.s3-us-west-2.amazonaws.com/cmake/cmake-3.14.5-Linux-s390x.deb package
2. A https://hub.docker.com/r/evolvedbinary/rocksjava s390x image

Not sure if there is more required for travis. Happy to help in any way I can.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8962

Reviewed By: mrambacher

Differential Revision: D31802198

Pulled By: pdillinger

fbshipit-source-id: 683511466fa6b505f85ba5a9964a268c6151f0c2

e9702486

21 10月, 2021 6 次提交

Fix atomic flush waiting forever for MANIFEST write (#9034) · f72fd585

由 Yanqin Jin 提交于 10月 20, 2021

Summary:
In atomic flush, concurrent background flush threads will commit to the MANIFEST
one by one, in the order of the IDs of their picked memtables for all included column
families. Each time, a background flush thread decides whether to wait based on two
criteria:
- Is db stopped? If so, don't wait.
- Am I the one to commit the currently earliest memtable? If so, don't wait and ready to go.

When atomic flush was implemented, error writing to or syncing the MANIFEST would
cause the db to be stopped. Therefore, this background thread does not have to check
for the background error while waiting. If there has been such an error, `DBStopped()`
would have been true, and this thread will **not** wait forever.

After we improved error handling, RocksDB may map an IOError while writing to MANIFEST
to a soft error, if there is no WAL. This requires the background threads to check for
background error while waiting. Otherwise, a background flush thread may wait forever.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9034

Test Plan: make check

Reviewed By: zhichao-cao

Differential Revision: D31639225

Pulled By: riversand963

fbshipit-source-id: e9ab07c4d8f2eade238adeefe3e42dd9a5a3ebbd

f72fd585

Update Release Version to 6.26 (#9059) · 633f069c

由 sdong 提交于 10月 20, 2021

Summary:
Before cutting release branch 6.26, update version.h and release notes

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9059

Reviewed By: ajkr

Differential Revision: D31805126

fbshipit-source-id: ae85ccf06ec756fa21163161f53fd0b728e6e32e

633f069c

remove unused local obj and simpilify comple code (#9052) · 0a73ada7

由 leipeng 提交于 10月 20, 2021

Summary:
This PR does not change code sematics, it just changes for:

1. local obj `nonmem_w` and `lfile` are unused
2. null check for `delete ptr` is unnecessary
3. use `unique_ptr::reset` instead of `release` + `delete`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9052

Reviewed By: zhichao-cao

Differential Revision: D31801661

Pulled By: anand1976

fbshipit-source-id: 16a77d45da8c8833bf5bf3bce546bb3711b335df

0a73ada7

db_impl_write.cc: use stats_ instead of immutable_db_options_.stats (#9053) · 0c53b418

由 leipeng 提交于 10月 20, 2021

Summary:
This PR has no semantic changes, just to make code shorter.

`stats_` has value same with `immutable_db_options_.stats`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9053

Reviewed By: zhichao-cao

Differential Revision: D31801603

Pulled By: anand1976

fbshipit-source-id: cbd8fe478d3e90ae078ace49b4f2eb9bb028ccf6

0c53b418

Support `GetMapProperty()` with "rocksdb.dbstats" (#9057) · 4217d1bc

由 Andrew Kryczka 提交于 10月 20, 2021

Summary:
This PR supports querying `GetMapProperty()` with "rocksdb.dbstats" to get the DB-level stats in a map format. It only reports cumulative stats over the DB lifetime and, as such, does not update the baseline for interval stats. Like other map properties, the string keys are not (yet) exposed in the public API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9057

Test Plan: new unit test

Reviewed By: zhichao-cao

Differential Revision: D31781495

Pulled By: ajkr

fbshipit-source-id: 6f77d3aee8b4b1a015061b8c260a123859ceaf9b

4217d1bc

Incremental Space Amp Compactions in Universal Style (#8655) · c66b4429

由 sdong 提交于 10月 20, 2021

Summary:
This commit introduces incremental compaction in univeral style for space amplification. This follows the first improvement mentioned in https://rocksdb.org/blog/2021/04/12/universal-improvements.html . The implemention simply picks up files about size of max_compaction_bytes to compact and execute if the penalty is not too big. More optimizations can be done in the future, e.g. prioritizing between this compaction and other types. But for now, the feature is supposed to be functional and can often reduce frequency of full compactions, although it can introduce penalty.

In order to add cut files more efficiently so that more files from upper levels can be included, SST file cutting threshold (for current file + overlapping parent level files) is set to 1.5X of target file size. A 2MB target file size will generate files like this: https://gist.github.com/siying/29d2676fba417404f3c95e6c013c7de8 Number of files indeed increases but it is not out of control.

Two set of write benchmarks are run:
1. For ingestion rate limited scenario, we can see full compaction is mostly eliminated: https://gist.github.com/siying/959bc1186066906831cf4c808d6e0a19 . The write amp increased from 7.7 to 9.4, as expected. After applying file cutting, the number is improved to 8.9. In another benchmark, the write amp is even better with the incremental approach: https://gist.github.com/siying/d1c16c286d7c59c4d7bba718ca198163
2. For ingestion rate unlimited scenario, incremental compaction turns out to be too expensive most of the time and is not executed, as expected.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8655

Test Plan: Add unit tests to the functionality.

Reviewed By: ajkr

Differential Revision: D31787034

fbshipit-source-id: ce813e63b15a61d5a56e97bf8902a1b28e011beb

c66b4429

20 10月, 2021 7 次提交

Add lowest_used_cache_tier to ImmutableDBOptions to enable or disable Secondary Cache (#9050) · 6d93b875

由 Zhichao Cao 提交于 10月 19, 2021

Summary:
Currently, if Secondary Cache is provided to the lru cache, it is used by default. We add CacheTier to advanced_options.h to describe the cache tier we used. Add a `lowest_used_cache_tier` option to `DBOptions` (immutable) and pass it to BlockBasedTableReader to decide if secondary cache will be used or not. By default it is `CacheTier::kNonVolatileTier`, which means, we always use both block cache (kVolatileTier) and secondary cache (kNonVolatileTier). By set it to `CacheTier::kVolatileTier`, the DB will not use the secondary cache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9050

Test Plan: added new tests

Reviewed By: anand1976

Differential Revision: D31744769

Pulled By: zhichao-cao

fbshipit-source-id: a0575ebd23e1c6dfcfc2b4c8578764e73b15bce6

6d93b875

Add "Java API Changes" session in HISTORY (#9055) · f20b07ce

由 Jay Zhuang 提交于 10月 19, 2021

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9055

Reviewed By: ajkr

Differential Revision: D31765398

Pulled By: jay-zhuang

fbshipit-source-id: 77ed67d69415c9fbbfc1132b15310b293e3939c6

f20b07ce

Ignore non-overlapping levels when determinig grandparent files (#9051) · f053851a

由 sdong 提交于 10月 19, 2021

Summary:
Right now, when picking a compaction, grand parent files are from output_level + 1. This usually works, but if the level doesn't have any overlapping file, it will be more efficient to go further down. This is because the files are likely to be trivial moved further and might create a violation of max_compaction_bytes. This situation can naturally happen and might happen even more with TTL compactions. There is no harm to fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9051

Test Plan: Run existing tests and see it passes. Also briefly run crash test.

Reviewed By: ajkr

Differential Revision: D31748829

fbshipit-source-id: 52b99ab4284dc816d22f34406d528a3c98ff6719

f053851a

Improve data block construction performance (#9040) · b234a3f5

由 Peter Dillinger 提交于 10月 19, 2021

Summary:
... by bypassing tracking of last_key in BlockBuilder when
last_key is already known (for BlockBasedTableBuilder::data_block).

I tried extracting a base class of BlockBuilder without the last_key
tracking at all, but that became complicated by NewFlushBlockPolicy() in
the public API referencing BlockBuilder, which would need to be the base
class, and I don't want to replace nearly all the internal references to
BlockBuilder.

Possible follow-up:
* Investigate / consider using AddWithLastKey in more places

This improvement should stack with https://github.com/facebook/rocksdb/issues/9039

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9040

Test Plan:
TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000
Compiled with DEBUG_LEVEL=0
Test vs. control runs simulaneous for better accuracy, units = ops/sec

Run 1: 278929 vs. 267799 (+4.2%)
Run 2: 281836 vs. 267432 (+5.4%)
Run 3: 278279 vs. 270454 (+2.9%)

(This benchmark is chosen to have detectable signal-to-noise, not to
represent expected improvement percent on real workloads.)

Reviewed By: mrambacher

Differential Revision: D31706033

Pulled By: pdillinger

fbshipit-source-id: 8a50fe6fefdd67b6d7665ffa687bbdcf5ad0d5ec

b234a3f5

Fix stress/crash test handling of SST unique IDs (#9054) · 0534393f

由 Peter Dillinger 提交于 10月 19, 2021

Summary:
Was not handling the case of OnTableFileCreated invoked for
table file NOT created.

Also improved error reporting and caught a missing status check.

Also strengthened the db_stress listener to require file_size > 0 when
status.ok(). We would be violating the API contract if status is OK and
we didn't create a valid SST file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9054

Test Plan: make blackbox_crash_test for a while

Reviewed By: zhichao-cao

Differential Revision: D31765200

Pulled By: pdillinger

fbshipit-source-id: 7c527f5531bc239a5efd7a7b018545d480f926e2

0534393f

Allow unregistered options to be ignored in DBOptions from files (#9045) · 8fb3fe8d

由 mrambacher 提交于 10月 19, 2021

Summary:
Adds changes to DBOptions (comparable to ColumnFamilyOptions) to allow some option values to be ignored on rehydration from the Options file. This is necessary for some customizable classes that were not registered with the ObjectRegistry but are saved/restored from the Options file.

All tests pass. Will run check_format_compatible.sh shortly.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9045

Reviewed By: zhichao-cao

Differential Revision: D31761664

Pulled By: mrambacher

fbshipit-source-id: 300c2251639cce2b223481c3bb2a63877b1f3766

8fb3fe8d

New-style blob option bindings, Java option getter and improve/fix option parsing (#8999) · 8d615a2b

由 Alan Paxton 提交于 10月 19, 2021

Summary:
Implementation of https://github.com/facebook/rocksdb/issues/8221, plus/including extension of Java options API to allow the get() of options from RocksDB. The extension allows more comprehensive testing of options at the Java side, by validating that the options are set at the C++ side.

Variations on methods:
MutableColumnFamilyOptions.MutableColumnFamilyOptionsBuilder getOptions()
MutableDBOptions.MutableDBOptionsBuilder getDBOptions()

retrieve the options via RocksDB C++ interfaces, and parse the resulting string into one of the Java-style option objects.

This necessitated generalising the parsing of option strings in Java, which now parses the full range of option strings returned by the C++ interface, rather than a useful subset. This necessitates the list-separator being changed to :(colon) from , (comma).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8999

Reviewed By: jay-zhuang

Differential Revision: D31655487

Pulled By: ltamasi

fbshipit-source-id: c38e98145c81c61dc38238b0df580db176ce4efd

8d615a2b

19 10月, 2021 8 次提交

Experimental support for SST unique IDs (#8990) · ad5325a7

由 Peter Dillinger 提交于 10月 18, 2021

Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).

Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990

Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D31582865

Pulled By: pdillinger

fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243

ad5325a7

Add property_bag to FileOptions (#9030) · aa218968

由 anand76 提交于 10月 18, 2021

Summary:
Add a property_bag option in FileOptions for direct FileSystem users to pass custom properties to the provider in APIs such as NewRandomAccessFile, NewWritableFile etc. This field will be ignored/not populated by RocksDB.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9030

Reviewed By: zhichao-cao

Differential Revision: D31630643

Pulled By: anand1976

fbshipit-source-id: 1e1ddc5e2933ecada99a94eada5f309b674a03e8

aa218968

Fix out-of-bounds access in MultiDBParallelOpenTest (#9046) · f0841d4f

由 Giuseppe Ottaviano 提交于 10月 18, 2021

Summary:
`dbs` should not be cleared, as it is reused later when reopening the DBs, so we have an out-of-bounds access with `dbnames[dbnum]`. The values left in the vector don't need to be reset, as the db pointer is an out parameter for `DB::Open`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9046

Reviewed By: pdillinger

Differential Revision: D31738263

Pulled By: ot

fbshipit-source-id: c619e947b8d3dbc3d896f29971f093d3e3c794d3

f0841d4f

Make `DB::Close()` thread-safe (#8970) · 314de7e7

由 Jay Zhuang 提交于 10月 18, 2021

Summary:
If `DB::Close()` is called in multi-thread env, the resource
could be double released, which causes exception or assert.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8970

Test Plan:
Test with multi-thread benchmark, with each thread try to
close the DB at the end.

Reviewed By: pdillinger

Differential Revision: D31242042

Pulled By: jay-zhuang

fbshipit-source-id: a61276b1b61e07732e375554106946aea86a23eb

314de7e7

keyMayExist() supports ByteBuffer (#9013) · 86cf7266

由 Alan Paxton 提交于 10月 18, 2021

Summary:
closes https://github.com/facebook/rocksdb/issues/7917

Implemented ByteBuffer API variants of Java keyMayExist() uniformly with and without column families, read options and return data values. Implemented 2 supporting C++ JNI methods.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9013

Reviewed By: mrambacher

Differential Revision: D31665989

Pulled By: jay-zhuang

fbshipit-source-id: 8adc1730217dba38d6fa7b31d788650a33e28af1

86cf7266

Deflaky ObsoleteFilesTest (#9049) · 53a0ab2b

由 Jay Zhuang 提交于 10月 18, 2021

Summary:
WaitForFlushMemTable() may only wait for mem flush but not background flush
finishing. The the obsoleted file may not be purged yet.
https://github.com/facebook/rocksdb/blob/fcaa7ff6381fe6052b37a1d013b14960ea23ac17/db/db_impl/db_impl_compaction_flush.cc#L2200-L2203

Use WaitForCompact() instead to wait for background flush job.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9049

Test Plan: `gtest-parallel ./obsolete_files_test --gtest_filter=ObsoleteFilesTest.DeleteObsoleteOptionsFile -r 1000`

Reviewed By: zhichao-cao

Differential Revision: D31737343

Pulled By: jay-zhuang

fbshipit-source-id: 82276ebeae7c7c75a733d3e1fd1c130d45e4761f

53a0ab2b

Fix gcc-11 compile error (#9043) · b4326b52

由 Jay Zhuang 提交于 10月 18, 2021

Summary:
gcc11 added new static check.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9043

Test Plan: Added CI for gcc11 build

Reviewed By: zhichao-cao

Differential Revision: D31716005

Pulled By: jay-zhuang

fbshipit-source-id: 9f53be6f2f9e58e39b83359f6bbf66f945d57429

b4326b52

Fix COMMIT_ID in regression_test.sh (#9047) · 908a999a

由 Peter Dillinger 提交于 10月 18, 2021

Summary:
https://github.com/facebook/rocksdb/issues/8031 broke internal tests. This should fix but also preserve
the intended capability of getting git commit id when hg not used

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9047

Test Plan: already broken ¯\\_(ツ)_/¯

Reviewed By: zhichao-cao

Differential Revision: D31732198

Pulled By: pdillinger

fbshipit-source-id: 7dba8531ddca55a6de5e04978a1a1601aae4cee9

908a999a

18 10月, 2021 1 次提交

Two performance improvements in BlockBuilder (#9039) · 9d66d6d1

由 Peter Dillinger 提交于 10月 18, 2021

Summary:
Primarily, this change reserves space in the std::string for building
the next block once a block is finished, using `block_size` as
reservation size. Note: also tried reusing same std::string in the
common "unbuffered" path but that showed no benefit or regression.

Secondarily, this slightly reduces the work in resetting `restarts_`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9039

Test Plan:
TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000
Compiled with DEBUG_LEVEL=0
Test vs. control runs simulaneous for better accuracy, units = ops/sec

Run 1, Primary change only: 292697 vs. 280267 (+4.4%)
Run 2, Primary change only: 288763 vs. 279621 (+3.3%)
Run 1, Secondary change only: 260065 vs. 254232 (+2.3%)
Run 2, Secondary change only: 275925 vs. 272248 (+1.4%)
Run 1, Both changes: 284890 vs. 270372 (+5.3%)
Run 2, Both changes: 263511 vs. 258188 (+2.0%)

Reviewed By: zhichao-cao

Differential Revision: D31701253

Pulled By: pdillinger

fbshipit-source-id: 7e40810afbb98e6b6446955e77bda59e69b19ffd

9d66d6d1

17 10月, 2021 1 次提交

Add (Live)FileStorageInfo API (#8968) · 3ffb3baa

由 Peter Dillinger 提交于 10月 16, 2021

Summary:
New classes FileStorageInfo and LiveFileStorageInfo and
'experimental' function DB::GetLiveFilesStorageInfo, which is intended
to largely replace several fragmented DB functions needed to create
checkpoints and backups.

This function is now used to create checkpoints and backups, because
it fixes many (probably not all) of the prior complexities of checkpoint
not having atomic access to DB metadata. This also ensures strong
functional test coverage of the new API. Specifically, much of the old
CheckpointImpl::CreateCustomCheckpoint has been migrated to and
updated in DBImpl::GetLiveFilesStorageInfo, with the former now
calling the latter.

Also, the class FileStorageInfo in metadata.h compatibly replaces
BackupFileInfo and serves as a new base class for SstFileMetaData.
Some old fields of SstFileMetaData are still provided (for now) but
deprecated.

Although FileStorageInfo::directory is accurate when using db_paths
and/or cf_paths, these have never been supported by Checkpoint
nor BackupEngine and still are not. This change does now detect
these cases and return NotSupported when appropriate. (More work
needed for support.)

Somehow this change broke ProgressCallbackDuringBackup, but
the progress_callback logic was dubious to begin with because it
would call the callback based on copy buffer size, not size actually
copied. Logic and test updated to track size actually copied
per-thread.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8968

Test Plan:
tests updated.
DB::GetLiveFilesStorageInfo mostly tested by use in CheckpointImpl.
DBTest.SnapshotFiles updated to also test GetLiveFilesStorageInfo,
including reading the data after DB close.
Added CheckpointTest.CheckpointWithDbPath (NotSupported).

Reviewed By: siying

Differential Revision: D31242045

Pulled By: pdillinger

fbshipit-source-id: b183d1ce9799e220daaefd6b3b5365d98de676c0

3ffb3baa

16 10月, 2021 1 次提交

SyncPoint::Process thrashes heap ... fix it (#9023) · 678ba5e4

由 matthewvon 提交于 10月 15, 2021

Summary:
The first parameter of SyncPoint::Process is "const std::string&".  The majority, maybe all, of the actual calls to this function use a "const char *".  The conversion before entering the function requires a construction of a std::string object on the heap.  This std::object is then typically not needed because first use of the string is a rocksdb::Slice which has a less costly conversion of char * to slice.

Example:

We have a load and iterate test.  The test loads 10m keys and iterates most via 10 rocksdb::Iterator objects.  We used TCMALLOC to gather information about allocation and space usage during iterators.

- Before this PR:  test took 32 min 17 sec
- After this PR:  test took 1 min 14 sec

The TCMALLOC top object list before this PR:

<pre>
Total: 5105999 objects
 5003717  98.0%  98.0%  5009471  98.1% rocksdb::DBIter::MergeValuesNewToOld (inline)
   20260   0.4%  98.4%    20260   0.4% std::__cxx11::basic_string::_M_mutate
   15214   0.3%  98.7%    15214   0.3% rocksdb::UncompressBlockContentsForCompressionType (inline)
   13408   0.3%  99.0%    13408   0.3% std::_Rb_tree::_M_emplace_hint_unique [clone .constprop.416] (inline)
   12957   0.3%  99.2%    12957   0.3% std::_Rb_tree::_M_emplace_hint_unique [clone .constprop.405] (inline)
    9327   0.2%  99.4%     9327   0.2% std::_Rb_tree::_M_copy (inline)
    7691   0.2%  99.5%     9919   0.2% JVM_FindSignal
    2859   0.1%  99.6%     2859   0.1% rocksdb::Cleanable::RegisterCleanup
    2844   0.1%  99.7%     2844   0.1% std::map::operator[] (inline)
</pre>

The "MergeValuesNewToOld (inline)" objects are the #define wrappers to SyncPoint::Process.  We discovered this in a 5.18 rocksdb release.  There TCMALLOC was more specific that std::basic_string was being constructed.  I believe that was before SyncPoint::Process was declared inline in subsequent releases.

The TCMALLOC top object list after this PR:

<pre>
Total: 104911 objects
   45090  43.0%  43.0%    45090  43.0% rocksdb::Cleanable::RegisterCleanup
   29995  28.6%  71.6%    29995  28.6% rocksdb::LRUCacheShard::Insert
   15229  14.5%  86.1%    15229  14.5% rocksdb::UncompressBlockContentsForCompressionType (inline)
    4373   4.2%  90.3%     4551   4.3% JVM_FindSignal
    2881   2.7%  93.0%     2881   2.7% rocksdb::::ReadBlockFromFile (inline)
    1162   1.1%  94.1%     1176   1.1% rocksdb::BlockFetcher::ReadBlockContents (inline)
    1036   1.0%  95.1%     1036   1.0% std::__cxx11::basic_string::_M_mutate
     869   0.8%  95.9%      869   0.8% std::vector::_M_realloc_insert (inline)
     806   0.8%  96.7%      806   0.8% SnmpAgent::GetVariables (inline)
</pre>

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9023

Reviewed By: pdillinger

Differential Revision: D31610907

Pulled By: mrambacher

fbshipit-source-id: 574ff51b639dd46ad253a8e664a575f06b7cc85d

678ba5e4

15 10月, 2021 9 次提交

add IsSyncThreadSafe() override to EncryptedWritableFile (#8993) · ab675571

由 jsteemann 提交于 10月 14, 2021

Summary:
EncryptedWritableFile is derived from FSWritableFile, which implements
the `IsSyncThreadSafe()` function as

    bool IsSyncThreadSafe() const { return false; }

EncryptedWritableFile does not override this method from the base class,
so the `IsSyncThreadSafe()` function on an EncryptedWritableFile will
always return false.
This change adds an override of `IsSyncThreadSafe()` to
EncryptedWritableFile so that the latter will now ask its underlying
`file_` object for the thread-safety of sync operations.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8993

Reviewed By: jay-zhuang

Differential Revision: D31613123

Pulled By: ajkr

fbshipit-source-id: b18625e21a9911744eef3215c29913490e4b6001

ab675571

Mention a new BlobDB option in a couple of comments (#9038) · fbb09cf7

由 Levi Tamasi 提交于 10月 14, 2021

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9038

Reviewed By: riversand963

Differential Revision: D31663906

fbshipit-source-id: a84ff07a1239f0c2eaeb3b776620fd0f7d4331bc

fbb09cf7

improve-histogram-performance: remove valueIndexMap_ (#8625) · 97b30dee

由 leipeng 提交于 10月 14, 2021

Summary:
`valueIndexMap_` in histogram is redundant and search in `valueIndexMap_` is slower than search in `bucketValues_`.

this PR delete `valueIndexMap_` and search in `bucketValues_` by `std::lower_bound`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8625

Reviewed By: zhichao-cao

Differential Revision: D31613386

Pulled By: ajkr

fbshipit-source-id: d7415d724f5c8f41f80cbe82afd7467cfad6f009

97b30dee

Remove -Wshorten-64-to-32 FreeBSD on aarch64 as it breaks compilation. (#9010) · c246c9c6

由 mikael 提交于 10月 14, 2021

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9010

Reviewed By: pdillinger

Differential Revision: D31613434

Pulled By: ajkr

fbshipit-source-id: 305a84fa715d0121fa65abaea3d32bad41233957

c246c9c6

Fix rate limit for MixGraph (#9027) · 140db3c4

由 Ubuntu 提交于 10月 14, 2021

Summary:
Fix race conditions of the read and write limiters.

Close https://github.com/facebook/rocksdb/issues/8215 .

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9027

Reviewed By: ajkr

Differential Revision: D31645361

Pulled By: zhichao-cao

fbshipit-source-id: 8ea7731991da422eecff2790c1e32db44c751965

140db3c4

Fix format script for Ubuntu (#9028) · e5aa7dea

由 Huachao Huang 提交于 10月 14, 2021

Summary:
I get `clang-format-diff` after running `apt install clang-format` on Ubuntu instead of `clang-format-diff.py`. So I think it makes sense to make the format script compatible with this behavior.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9028

Reviewed By: ajkr

Differential Revision: D31634041

Pulled By: jay-zhuang

fbshipit-source-id: b936de791ddcafa6ff304039ef33936e1e04864d

e5aa7dea

MergingIterator: rearrange fields to reduce paddings (#9024) · 4c277ab2

由 leipeng 提交于 10月 14, 2021

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9024

Reviewed By: pdillinger

Differential Revision: D31614752

Pulled By: ajkr

fbshipit-source-id: ef19ae243127f992e982a5a3b8ddefe7946246f8

4c277ab2

Fix multiget throwing NPE for num of keys > 70k (#9012) · f5526af8

由 Alan Paxton 提交于 10月 14, 2021

Summary:
closes https://github.com/facebook/rocksdb/issues/8039

Unnecessary use of multiple local JNI references at the same time, 1 per key, was limiting the size of the key array. The local references don't need to be held simultaneously, so if we rearrange the code we can make it work for bigger key arrays.

Incidentally, make errors throw helpful exception messages rather than returning a null pointer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9012

Reviewed By: mrambacher

Differential Revision: D31580862

Pulled By: jay-zhuang

fbshipit-source-id: ce05831d52ede332e1b20e74d2dc621d219b9616

f5526af8

Update HISTORY.md for #9009 (#9036) · ffc48b6c

由 Andrew Kryczka 提交于 10月 14, 2021

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9036

Reviewed By: zhichao-cao

Differential Revision: D31640901

Pulled By: ajkr

fbshipit-source-id: 0b1e6e36094a74bb7906af44e29ecbeaa258de58

ffc48b6c

14 10月, 2021 1 次提交

Set perf_level to kEnableTimeExceptForMutex in regression script (#8031) · 37418105

由 anand76 提交于 10月 13, 2021

Summary:
Set the perf_level in ```tools/regression_test.sh``` in order to exercise ```PerfContext``` counters in regression tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8031

Test Plan: Manually run the script

Reviewed By: ajkr

Differential Revision: D31508269

Pulled By: anand1976

fbshipit-source-id: 20ddfd1cbca37f1439eed2870086a86d90653b44

37418105

13 10月, 2021 3 次提交

Fix sequence number bump logic in multi-CF SST ingestion (#9005) · 4bfd415e

由 Giuseppe Ottaviano 提交于 10月 12, 2021

Summary:
The code in `IngestExternalFiles()` that bumps the DB's sequence number
depending on what seqnos were assigned to the files has 3 bugs:

1) There is an assertion that the sequence number is increased in all the
affected column families, but this is unnecessary, it is fine if some files can
stick to a lower sequence number. It is very easy to hit the assertion: it is
sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
doesn't (for example the CF is empty). The line added in the
`IngestFilesIntoMultipleColumnFamilies_Success` test makes the assertion fail.

2) SetLastSequence() is called with the sum of all the bumps across CFs, but we
should take the maximum instead, as all CFs start with the current seqno and bump
it independently.

3) The code above is accidentally under a `#ifndef NDEBUG`, so it doesn't run in
optimized builds, so some files may be assigned seqnos from the future.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9005

Test Plan:
Added line in `IngestFilesIntoMultipleColumnFamilies_Success` that
triggers the assertion, verified that the test (and all the others) pass after
the fix.

Reviewed By: ajkr

Differential Revision: D31597892

Pulled By: ot

fbshipit-source-id: c2d3237f90290df1178736ace8653a9623f5a770

4bfd415e

Add a benchmarking wrapper script for BlobDB (#9015) · b4e59a48

由 Levi Tamasi 提交于 10月 12, 2021

Summary:
The patch adds a new BlobDB benchmarking script called `run_blob_bench.sh`.
It is a thin wrapper around `benchmark.sh` (similarly to `run_flash_bench.sh`):
it actually calls `benchmark.sh` a number of times, cycling through six workloads,
two write-only ones (bulk load and overwrite), two read/write ones (point lookups
while writing, range scans while writing), and two read-only ones (point lookups
and range scans).

Note: this is a simpler/cleaned up/reworked version of the script used to produce the
benchmark results in http://rocksdb.org/blog/2021/05/26/integrated-blob-db.html .
The new version takes advantage of several recent `benchmark.sh` improvements
like the ability to pass in arbitrary `db_bench` options or the possibility of using a
job ID.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9015

Test Plan: Ran the script manually with different parameter combinations.

Reviewed By: riversand963

Differential Revision: D31555277

Pulled By: ltamasi

fbshipit-source-id: 0e151b2f7b2cf6f66ed7f95455571492ad7ea87f

b4e59a48

Update HISTORY for PR 8994 (#9017) · 7cc52cd8

由 Levi Tamasi 提交于 10月 12, 2021

Summary:
Also, expand on/clarify a comment in `VersionStorageInfoTest`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9017

Reviewed By: riversand963

Differential Revision: D31566130

Pulled By: ltamasi

fbshipit-source-id: 1d30c7af084c4de7b2030bc6c768838d65746010

7cc52cd8

12 10月, 2021 2 次提交

Fix race in WriteBufferManager (#9009) · 22d4dc50

由 Giuseppe Ottaviano 提交于 10月 12, 2021

Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).

This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.

It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.

While at it, do a couple of optimizations:

- In `WBMStallInterface::Signal()` signal the CV only after releasing the
  lock. Signaling under the lock is a common pitfall, as it causes the woken-up
  thread to immediately go back to sleep because the mutex is still locked by
  the awaker.

- Move all allocations and deallocations outside of the lock.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009

Test Plan:
```
USE_CLANG=1 make -j64 all check
```

Reviewed By: akankshamahajan15

Differential Revision: D31550668

Pulled By: ot

fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580

22d4dc50

Inline an empty destructor (#9004) · e1139167

由 Yanqin Jin 提交于 10月 11, 2021

Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9004

Inline an empty destructor

Reviewed By: ltamasi

Differential Revision: D31525561

fbshipit-source-id: 3b9e37f06b0c70529a5d2d660de21ea335c73611

e1139167

kvdb / rocksdb 11 个月 前同步成功

kvdb / rocksdb
11 个月前同步成功