提交 · a4919d6b62fd7156564d11ea56440173d6c1bec4 · kvdb / rocksdb

08 5月, 2021 1 次提交

Cap automatic arena block size to 1 MB (#7907) · a4919d6b

由 sdong 提交于 5月 07, 2021

Summary:
Larger arena block size does provide the benefit of reducing allocation overhead, however it may cause other troubles. For example, allocator is more likely not to allocate them to physical memory and trigger page fault. Weighing the risk, we cap the arena block size to 1MB. Users can always use a larger value if they want.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7907

Test Plan: Run all existing tests

Reviewed By: pdillinger

Differential Revision: D26135269

fbshipit-source-id: b7f55afd03e6ee1d8715f90fa11b6c33944e9ea8

a4919d6b

07 5月, 2021 1 次提交

Revert accidental enabling broken ClockCache in stress test (#8277) · ecd63b92

由 Peter Dillinger 提交于 5月 06, 2021

Summary:
From https://github.com/facebook/rocksdb/issues/8261

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8277

Test Plan: briefly make blackbox_crash_test

Reviewed By: zhichao-cao

Differential Revision: D28270648

Pulled By: pdillinger

fbshipit-source-id: 9bfd46c5a1a449165f6597bddb17af910331773f

ecd63b92

06 5月, 2021 5 次提交

Permit stdout "fail"/"error" in whitebox crash test (#8272) · b71b4597

由 Andrew Kryczka 提交于 5月 05, 2021

Summary:
In https://github.com/facebook/rocksdb/issues/8268, the `db_stress` stdout began containing both the strings
"fail" and "error" (case-insensitive). The whitebox crash test
failed upon seeing either of those strings.

I checked that all other occurrences of "fail" and "error"
(case-insensitive) that `db_stress` produces are printed to `stderr`. So
this PR separates the handling of `db_stress`'s stdout and stderr, and
only fails when one those bad strings are found in stderr.

The downside of this PR is `db_stress`'s original interleaving of stdout/stderr is not preserved in `db_crashtest.py`'s output.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8272

Test Plan:
run it; see it succeeds for several runs until encountering a real error

```
$ python3 tools/db_crashtest.py whitebox --simple --random_kill_odd=8887 --max_key=1000000 --value_size_mult=33
...
db_stress: cache/clock_cache.cc:483: bool rocksdb::{anonymous}::ClockCacheShard::Unref(rocksdb::{anonymous}::CacheHandle*, bool, rocksdb::{anonymous}::CleanupContext*): Assertion `CountRefs(flags) > 0' failed.

TEST FAILED. Output has 'fail'!!!
```

Reviewed By: zhichao-cao

Differential Revision: D28239233

Pulled By: ajkr

fbshipit-source-id: 3b8602a0d570466a7e2c81bb9c49468f7716091e

b71b4597

db_stress: wait for compaction to finish after open with failure injection (#8270) · 7f3a0f5b

由 sdong 提交于 5月 05, 2021

Summary:
When injecting in DB open, error can happen in background threads, causing DB open succeed, but DB is soon made read-only and subsequence writes will fail, which is not expected. To prevent it from happening, wait for compaction to finish before serving the traffic. If there is a failure, reopen.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8270

Test Plan: Run the test.

Reviewed By: ajkr

Differential Revision: D28230537

fbshipit-source-id: e2e97888904f9b9bb50c35ccf95b88c2319ef5c3

7f3a0f5b

Refactor kill point (#8241) · e19908cb

由 sdong 提交于 5月 05, 2021

Summary:
Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241

Test Plan: make release and run crash test for a while.

Reviewed By: anand1976

Differential Revision: D28078486

fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f

e19908cb

Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) · 8948dc85

由 mrambacher 提交于 5月 05, 2021

Summary:
The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions. This change cleans that up by introducing an ImmutableOptions struct. Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form).

Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR. All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes.

Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262

Reviewed By: pdillinger

Differential Revision: D28226540

Pulled By: mrambacher

fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf

8948dc85

Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268) · 0f42e50f

由 Andrew Kryczka 提交于 5月 05, 2021

Summary:
See release note in HISTORY.md.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268

Test Plan: unit test repro

Reviewed By: siying

Differential Revision: D28227901

Pulled By: ajkr

fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339

0f42e50f

05 5月, 2021 4 次提交

Fix use-after-free threading bug in ClockCache (#8261) · 3b981eaa

由 Peter Dillinger 提交于 5月 04, 2021

Summary:
In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with
-use_clock_cache, as well as db_bench -use_clock_cache, but not
single-threaded. Smaller cache size hits failure much faster. ASAN
reported the failuer as calling malloc_usable_size on the `key` pointer
of a ClockCache handle after it was reportedly freed. On detailed
inspection I found this bad sequence of operations for a cache entry:

state=InCache=1,refs=1
[thread 1] Start ClockCacheShard::Unref (from Release, no mutex)
[thread 1] Decrement ref count
state=InCache=1,refs=0
[thread 1] Suspend before CalcTotalCharge (no mutex)

[thread 2] Start UnsetInCache (from Insert, mutex held)
[thread 2] clear InCache bit
state=InCache=0,refs=0
[thread 2] Calls RecycleHandle (based on pre-updated state)
[thread 2] Returns to Insert which calls Cleanup which deletes `key`

[thread 1] Resume ClockCacheShard::Unref
[thread 1] Read `key` in CalcTotalCharge

To fix this, I've added a field to the handle to store the metadata
charge so that we can efficiently remember everything we need from
the handle in Unref. We must not read from the handle again if we
decrement the count to zero with InCache=1, which means we don't own
the entry and someone else could eject/overwrite it immediately.

Note before this change, on amd64 sizeof(Handle) == 56 even though there
are only 48 bytes of data. Grouping together the uint32_t fields would
cut it down to 48, but I've added another uint32_t, which takes it
back up to 56. Not a big deal.

Also fixed DisownData to cooperate with ASAN as in LRUCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261

Test Plan:
Manual + adding use_clock_cache to db_crashtest.py

Base performance
./cache_bench -use_clock_cache
Complete in 17.060 s; QPS = 2458513
New performance
./cache_bench -use_clock_cache
Complete in 17.052 s; QPS = 2459695

Any difference is easily buried in small noise.

Crash test shows still more bug(s) in ClockCache, so I'm expecting to
disable ClockCache from production code in a follow-up PR (if we
can't find and fix the bug(s))

Reviewed By: mrambacher

Differential Revision: D28207358

Pulled By: pdillinger

fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294

3b981eaa

Fix ConcurrentTaskLimiter token release for shutdown (#8253) · c70bae1b

由 Andrew Kryczka 提交于 5月 04, 2021

Summary:
Previously the shutdown process did not properly wait for all
`compaction_thread_limiter` tokens to be released before proceeding to
delete the DB's C++ objects. When this happened, we saw tests like
"DBCompactionTest.CompactionLimiter" flake with the following error:

```
virtual
rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl():
Assertion `outstanding_tasks_ == 0' failed.
```

There is a case where a token can still be alive even after the shutdown
process has waited for BG work to complete. In particular, this happens
because the shutdown process only waits for flush/compaction scheduled/unscheduled counters to all
reach zero. These counters are decremented in `BackgroundCallCompaction()`
functions. However, tokens are released in `BGWork*Compaction()` functions, which
actually wrap the `BackgroundCallCompaction()` function.

A simple sleep could repro the race condition:

```
$ diff --git a/db/db_impl/db_impl_compaction_flush.cc
b/db/db_impl/db_impl_compaction_flush.cc
index 806bc548a..ba59efa89 100644
 --- a/db/db_impl/db_impl_compaction_flush.cc
+++ b/db/db_impl/db_impl_compaction_flush.cc
@@ -2442,6 +2442,7 @@ void DBImpl::BGWorkCompaction(void* arg) {
       static_cast<PrepickedCompaction*>(ca.prepicked_compaction);
   static_cast_with_check<DBImpl>(ca.db)->BackgroundCallCompaction(
       prepicked_compaction, Env::Priority::LOW);
+  sleep(1);
   delete prepicked_compaction;
 }

$ ./db_compaction_test --gtest_filter=DBCompactionTest.CompactionLimiter
db_compaction_test: util/concurrent_task_limiter_impl.cc:24: virtual rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl(): Assertion `outstanding_tasks_ == 0' failed.
Received signal 6 (Aborted)
#0   /usr/local/fbcode/platform007/lib/libc.so.6(gsignal+0xcf) [0x7f02673c30ff] ??      ??:0
https://github.com/facebook/rocksdb/issues/1   /usr/local/fbcode/platform007/lib/libc.so.6(abort+0x134) [0x7f02673ac934] ??       ??:0
...
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8253

Test Plan: sleeps to expose race conditions

Reviewed By: akankshamahajan15

Differential Revision: D28168064

Pulled By: ajkr

fbshipit-source-id: 9e5167c74398d323e7975980c5cc00f450631160

c70bae1b

Deflake DBTest.L0L1L2AndUpHitCounter (#8259) · c2a3424d

由 Andrew Kryczka 提交于 5月 04, 2021

Summary:
Previously we saw flakes on platforms like arm on CircleCI, such as the following:

```
Note: Google Test filter = DBTest.L0L1L2AndUpHitCounter
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DBTest
[ RUN      ] DBTest.L0L1L2AndUpHitCounter
db/db_test.cc:5345: Failure
Expected: (TestGetTickerCount(options, GET_HIT_L0)) > (100), actual: 30 vs 100
[  FAILED  ] DBTest.L0L1L2AndUpHitCounter (150 ms)
[----------] 1 test from DBTest (150 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (150 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DBTest.L0L1L2AndUpHitCounter
```

The test was totally non-deterministic, e.g., flush/compaction timing would affect how many files on each level. Furthermore, it depended heavily on platform-specific details, e.g., by having a 32KB memtable, it could become full with a very different number of entries depending on the platform.

This PR rewrites the test to build a deterministic LSM with one file per level.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8259

Reviewed By: mrambacher

Differential Revision: D28178100

Pulled By: ajkr

fbshipit-source-id: 0a03b26e8d23c29d8297c1bccb1b115dce33bdcd

c2a3424d

Update CircleCI MacOS Xcode version to 11.3.0 (#8256) · 8a92564a

由 Jay Zhuang 提交于 5月 04, 2021

Summary:
To fix CircleCI pyenv installation failure.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8256

Reviewed By: ajkr

Differential Revision: D28191772

Pulled By: jay-zhuang

fbshipit-source-id: 2bbb1d5ded473e510c11c8ed27884c4ad073973f

8a92564a

04 5月, 2021 1 次提交

Hint temperature of bottommost level files to FileSystem (#8222) · c3ff14e2

由 sdong 提交于 5月 03, 2021

Summary:
As the first part of the effort of having placing different files on different storage types, this change introduces several things:
(1) An experimental interface in FileSystem that specify temperature to a new file created.
(2) A test FileSystemWrapper,  SimulatedHybridFileSystem, that simulates HDD for a file of "warm" temperature.
(3) A simple experimental feature ColumnFamilyOptions.bottommost_temperature. RocksDB would pass this value to FileSystem when creating any bottommost file.
(4) A db_bench parameter that applies the (2) and (3) to db_bench.

The motivation of the change is to introduce minimal changes that allow us to evolve tiered storage development.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8222

Test Plan:
./db_bench --benchmarks=fillrandom --write_buffer_size=2000000 -max_bytes_for_level_base=20000000  -level_compaction_dynamic_level_bytes --reads=100 -compaction_readahead_size=20000000 --reads=100000 -num=10000000

followed by

./db_bench --benchmarks=readrandom,stats --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -simulate_hybrid_fs_file=/tmp/warm_file_list -level_compaction_dynamic_level_bytes -compaction_readahead_size=20000000 --reads=500 --threads=16 -use_existing_db --num=10000000

and see results as expected.

Reviewed By: ajkr

Differential Revision: D28003028

fbshipit-source-id: 4724896d5205730227ba2f17c3fecb11261744ce

c3ff14e2

01 5月, 2021 1 次提交

Add more LSM info to FilterBuildingContext (#8246) · d2ca04e3

由 Peter Dillinger 提交于 4月 30, 2021

Summary:
Add `num_levels`, `is_bottommost`, and table file creation
`reason` to `FilterBuildingContext`, in anticipation of more powerful
Bloom-like filter support.

To support this, added `is_bottommost` and `reason` to
`TableBuilderOptions`, which allowed removing `reason` parameter from
`rocksdb::BuildTable`.

I attempted to remove `skip_filters` from `TableBuilderOptions`, because
filter construction decisions should arise from options, not one-off
parameters. I could not completely remove it because the public API for
SstFileWriter takes a `skip_filters` parameter, and translating this
into an option change would mean awkwardly replacing the table_factory
if it is BlockBasedTableFactory with new filter_policy=nullptr option.
I marked this public skip_filters option as deprecated because of this
oddity. (skip_filters on the read side probably makes sense.)

At least `skip_filters` is now largely hidden for users of
`TableBuilderOptions` and is no longer used for implementing the
optimize_filters_for_hits option. Bringing the logic for that option
closer to handling of FilterBuildingContext makes it more obvious that
hese two are using the same notion of "bottommost." (Planned:
configuration options for Bloom-like filters that generalize
`optimize_filters_for_hits`)

Recommended follow-up: Try to get away from "bottommost level" naming of
things, which is inaccurate (see
VersionStorageInfo::RangeMightExistAfterSortedRun), and move to
"bottommost run" or just "bottommost."

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8246

Test Plan:
extended an existing unit test to exercise and check various
filter building contexts. Also, existing tests for
optimize_filters_for_hits validate some of the "bottommost" handling,
which is now closely connected to FilterBuildingContext::is_bottommost
through TableBuilderOptions::is_bottommost

Reviewed By: mrambacher

Differential Revision: D28099346

Pulled By: pdillinger

fbshipit-source-id: 2c1072e29c24d4ac404c761a7b7663292372600a

d2ca04e3

29 4月, 2021 5 次提交

Refactor: use TableBuilderOptions to reduce parameter lists (#8240) · 85becd94

由 Peter Dillinger 提交于 4月 29, 2021

Summary:
Greatly reduced the not-quite-copy-paste giant parameter lists
of rocksdb::NewTableBuilder, rocksdb::BuildTable,
BlockBasedTableBuilder::Rep ctor, and BlockBasedTableBuilder ctor.

Moved weird separate parameter `uint32_t column_family_id` of
TableFactory::NewTableBuilder into TableBuilderOptions.

Re-ordered parameters to TableBuilderOptions ctor, so that `uint64_t
target_file_size` is not randomly placed between uint64_t timestamps
(was easy to mix up).

Replaced a couple of fields of BlockBasedTableBuilder::Rep with a
FilterBuildingContext. The motivation for this change is making it
easier to pass along more data into new fields in FilterBuildingContext
(follow-up PR).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8240

Test Plan: ASAN make check

Reviewed By: mrambacher

Differential Revision: D28075891

Pulled By: pdillinger

fbshipit-source-id: fddb3dbb8260a0e8bdcbb51b877ebabf9a690d4f

85becd94

Improve BlockPrefetcher to prefetch only for sequential scans (#7394) · a0e0feca

由 Akanksha Mahajan 提交于 4月 28, 2021

Summary:
BlockPrefetcher is used by iterators to prefetch data if they
anticipate more data to be used in future and this is valid for forward sequential
scans. But BlockPrefetcher tracks only num_file_reads_ and not if reads
are sequential. This presents problem for MultiGet with large number of
keys when it reseeks index iterator and data block. FilePrefetchBuffer
can end up doing large readahead for reseeks as readahead size
increases exponentially once readahead is enabled. Same issue is with
BlockBasedTableIterator.

Add previous length and offset read as well in BlockPrefetcher (creates
FilePrefetchBuffer) and FilePrefetchBuffer (does prefetching of data) to
determine if reads are sequential and then  prefetch.

Update the last block read after cache hit to take reads from cache also
in account.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7394

Test Plan: Add new unit test case

Reviewed By: anand1976

Differential Revision: D23737617

Pulled By: akankshamahajan15

fbshipit-source-id: 8e6917c25ed87b285ee495d1b68dc623d71205a3

a0e0feca

Fix a memory leak in c_test (#8237) · 0db4cde6

由 anand76 提交于 4月 28, 2021

Summary:
Don't call ```rocksdb_cache_disown_data()``` as it causes the memory allocated for ```shards_``` to be leaked.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8237

Reviewed By: jay-zhuang

Differential Revision: D28039061

Pulled By: anand1976

fbshipit-source-id: c3464efe2c006b93b4be87030116a12a124598c4

0db4cde6

Change CircleCI Windows to previous known good image (#8220) · 8fe33a0a

由 anand76 提交于 4月 28, 2021

Summary:
This is to try to resolve the VS2015 install failure in CircleCI Windows builds.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8220

Reviewed By: jay-zhuang

Differential Revision: D28061834

Pulled By: anand1976

fbshipit-source-id: b2663eb60babee603669a2c2cb55f182df1cc7b1

8fe33a0a

db_stress to add --open_metadata_write_fault_one_in (#8235) · cde69a7c

由 sdong 提交于 4月 28, 2021

Summary:
DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place.
If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully.
This option is enabled in crash test in half of the time.
Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235

Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory.

Reviewed By: anand1976

Differential Revision: D28010944

fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558

cde69a7c

28 4月, 2021 3 次提交

Add WAL flush API to C client (#8226) · 3949731d

由 Duarte Nunes 提交于 4月 27, 2021

Summary:
The C client is missing the`manual_wal_flush` option and the `flush_wal` API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8226

Reviewed By: ajkr

Differential Revision: D28000869

Pulled By: jay-zhuang

fbshipit-source-id: ed44937e7e7e75bc0dfa870a14147fbeef0c38f8

3949731d

Add 6.18, 6.19 and 6.20 to check_format_compatible.sh (#8236) · 65abb0cf

由 Akanksha Mahajan 提交于 4月 27, 2021

Summary:
Add 6.18, 6.19 and 6.20 to check_format_compatible.sh

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8236

Test Plan: ./tools/check_format_compatible.sh (tested without 2.7.fb as it was failing as mentioned in the script)

Reviewed By: mrambacher

Differential Revision: D28019160

Pulled By: akankshamahajan15

fbshipit-source-id: b59a7c5c14cb4c115926e9ae7c74ea586b22c9ed

65abb0cf

New C API to expose NewCompactOnDeletionCollectorFactory (#8233) · 13c655a8

由 Sahir Hoda 提交于 4月 27, 2021

Summary:
New C API rocksdb_options_add_compact_on_deletion_collector_factory to expose NewCompactOnDeletionCollectorFactory

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8233

Reviewed By: mrambacher

Differential Revision: D28018381

Pulled By: anand1976

fbshipit-source-id: 674c9ed902c91ff0d9f09e7a60c5f37b907604c6

13c655a8

27 4月, 2021 3 次提交

Rename variables in ImmutableCFOptions to avoid conflicts with ImmutableDBOptions (#8227) · 0ca6d629

由 mrambacher 提交于 4月 26, 2021

Summary:
Renaming ImmutableCFOptions::info_log and statistics to logger and stats. This is stage 2 in creating an ImmutableOptions class. It is necessary because the names match those in ImmutableOptions and have different types.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8227

Reviewed By: jay-zhuang

Differential Revision: D28000967

Pulled By: mrambacher

fbshipit-source-id: 3bf2aa04e8f1e8724d825b7deacf41080c14420b

0ca6d629

Fix cast-function-type warning (#8230) · c2c7d5e9

由 Mr-Leshiy 提交于 4月 26, 2021

Summary:
Fixing cast-function-type which is appears during the following build:
```bash
cmake ..  -DFAIL_ON_WARNINGS=ON -DCMAKE_C_COMPILER=x86_64-w64-mingw32-gcc -DCMAKE_CXX_COMPILER=x86_64-w64-mingw32-g++ -DCMAKE_SYSTEM_NAME=Windows
make rocksdb
```
Here is the log:
```
/home/leshiy/Work/rocksdb/port/win/env_win.cc: In constructor ‘rocksdb::port::WinClock::WinClock()’:
/home/leshiy/Work/rocksdb/port/win/env_win.cc:92:9: error: cast between incompatible function types from ‘FARPROC’ {aka ‘long long int (*)()’} to ‘rocksdb::port::WinClock::FnGetSystemTimePreciseAsFileTime’ {aka ‘void (*)(_FILETIME*)’} [-Werror=cast-function-type]
   92 |         (FnGetSystemTimePreciseAsFileTime)GetProcAddress(
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   93 |             module, "GetSystemTimePreciseAsFileTime");
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [CMakeFiles/rocksdb.dir/build.make:4337: CMakeFiles/rocksdb.dir/port/win/env_win.cc.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/rocksdb.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8230

Reviewed By: jay-zhuang

Differential Revision: D28000215

Pulled By: mrambacher

fbshipit-source-id: 874782cf48f70470e3fbd9097585bf42e810ca61

c2c7d5e9

WBWI Internal Move implementation from .h into .cpp (#8229) · 2760c2ae

由 Adam Retter 提交于 4月 26, 2021

Summary:
Moves some of the structural refactoring from https://github.com/facebook/rocksdb/pull/8135 into this PR.
This just cleans up the code by moving implementation out of the .h file and into the .cc file.

Should be considered for merge before both https://github.com/facebook/rocksdb/pull/7214 and https://github.com/facebook/rocksdb/pull/8135

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8229

Reviewed By: jay-zhuang

Differential Revision: D27999669

Pulled By: mrambacher

fbshipit-source-id: 6eccecbf1f11bb9f5a173e86d1e7bc448bc96071

2760c2ae

26 4月, 2021 2 次提交

Fix javadoc for keyMayExist (#8232) · 69c98682

由 Adam Retter 提交于 4月 26, 2021

Summary:
Closes https://github.com/facebook/rocksdb/issues/6985

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8232

Reviewed By: jay-zhuang

Differential Revision: D27999779

Pulled By: mrambacher

fbshipit-source-id: a37c88d93bde2692b8be9e46e673dda7bea701b2

69c98682

Move RegisterOptions into the Configurable API (#8223) · 6bab3a34

由 mrambacher 提交于 4月 26, 2021

Summary:
As previously coded, a Configurable extension would need access to code not in the public API.  This change moves RegisterOptions into the Configurable class and therefore available to public extensions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8223

Reviewed By: anand1976

Differential Revision: D27960188

Pulled By: mrambacher

fbshipit-source-id: ac88b19397183df633902def5b5701b9b65fbf40

6bab3a34

24 4月, 2021 1 次提交

Eliminate double-buffering of keys in block_based_table_builder (#8219) · cc1c3ee5

由 Saketh Are 提交于 4月 23, 2021

Summary:
The block_based_table_builder buffers some blocks in memory to construct a good compression dictionary. Before this commit, the keys from each block were buffered separately for convenience. However, the buffered block data implicitly contains all keys. This commit eliminates the redundant key buffers and reduces memory usage.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8219

Reviewed By: ajkr

Differential Revision: D27945851

Pulled By: saketh-are

fbshipit-source-id: caf3cac1217201e080a1e24b542bedf20973afee

cc1c3ee5

23 4月, 2021 5 次提交

Expose JemallocNodumpAllocator to C API (#8178) · d65d7d65

由 Sahir Hoda 提交于 4月 22, 2021

Summary:
Add new C APIs to create the JemallocNodumpAllocator and set it on a Cache object.

`make test` passes with and without `DISABLE_JEMALLOC=1`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8178

Reviewed By: jay-zhuang

Differential Revision: D27944631

Pulled By: ajkr

fbshipit-source-id: 2531729aa285a8985c58f22f093c4d53029c4a7b

d65d7d65

Make types of Immutable/Mutable Options fields match that of the underlying Option (#8176) · 01e460d5

由 mrambacher 提交于 4月 22, 2021

Summary:
This PR is a first step at attempting to clean up some of the Mutable/Immutable Options code. With this change, a DBOption and a ColumnFamilyOption can be reconstructed from their Mutable and Immutable equivalents, respectively.

readrandom tests do not show any performance degradation versus master (though both are slightly slower than the current 6.19 release).

There are still fields in the ImmutableCFOptions that are not CF options but DB options. Eventually, I would like to move those into an ImmutableOptions (= ImmutableDBOptions+ImmutableCFOptions). But that will be part of a future PR to minimize changes and disruptions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8176

Reviewed By: pdillinger

Differential Revision: D27954339

Pulled By: mrambacher

fbshipit-source-id: ec6b805ba9afe6e094bffdbd76246c2d99aa9fad

01e460d5

Add internal compaction API for Secondary instance (#8171) · f0fca2b1

由 Jay Zhuang 提交于 4月 22, 2021

Summary:
Add compaction API for secondary instance, which compact the files to a secondary DB path without installing to the LSM tree.
The API will be used to remote compaction.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8171

Test Plan: `make check`

Reviewed By: ajkr

Differential Revision: D27694545

Pulled By: jay-zhuang

fbshipit-source-id: 8ff3ec1bffdb2e1becee994918850c8902caf731

f0fca2b1

Add ZenFS to plugin list (#8218) · e85d8a65

由 Hans Holmberg 提交于 4月 22, 2021

Summary:
Add ZenFS, a file system for zoned block devices, to PLUGINS.md

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8218

Reviewed By: jay-zhuang

Differential Revision: D27944376

Pulled By: ajkr

fbshipit-source-id: c9ea2e9814001ccd7c56d7ef4d38e20dfeb48d1e

e85d8a65

Fix the false positive alert of CF consistency check in WAL recovery (#8207) · 09a9ec3a

由 Zhichao Cao 提交于 4月 22, 2021

Summary:
In current RocksDB, in recover the information form WAL, we do the consistency check for each column family when one WAL file is corrupted and PointInTimeRecovery is set. However, it will report a false positive alert on "SST file is ahead of WALs" when one of the CF current log number is greater than the corrupted WAL number (CF contains the data beyond the corrupted WAl) due to a new column family creation during flush. In this case, a new WAL is created (it is empty) during a flush. Also, due to some reason (e.g., storage issue or crash happens before SyncCloseLog is called), the old WAL is corrupted. The new CF has no data, therefore, it does not have the consistency issue.

Fix: when checking cfd->GetLogNumber() > corrupted_wal_number also check cfd->GetLiveSstFilesSize() > 0. So the CFs with no SST file data will skip the check here.

Note potential ignored inconsistency caused due to fix: empty CF can also be caused by write+delete. In this case, after flush, there is no SST files being generated. However, this CF still have the log in the WAL. When the WAL is corrupted, the DB might be inconsistent.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8207

Test Plan: added unit test, make crash_test

Reviewed By: riversand963

Differential Revision: D27898839

Pulled By: zhichao-cao

fbshipit-source-id: 931fc2d8b92dd00b4169bf84b94e712fd688a83e

09a9ec3a

22 4月, 2021 4 次提交

Add check to cmake to see if we need to link against -latomic (#8183) · 47b424f4

由 mrambacher 提交于 4月 22, 2021

Summary:
For some compilers/environments (e.g. Clang, riscv64), we need to link against -latomic.  Check if this is a requirement and add the library to the third-party libs if it is.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8183

Reviewed By: pdillinger

Differential Revision: D27773564

Pulled By: mrambacher

fbshipit-source-id: 68e15d823144f83fb02221c7bf5b1e43323419bf

47b424f4

Ignore comparator name mismatch in ldb manifest dump (#8216) · 31435276

由 Yanqin Jin 提交于 4月 21, 2021

Summary:
RocksDB allows user-specified custom comparators which may not be known to `ldb`,
a built-in tool for checking/mutating the database. Therefore, column family comparator
names mismatch encountered during manifest dump should not prevent the dumping from
proceeding.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8216

Test Plan:
```
make check
```

Also manually do the following
```
KEEP_DB=1 ./db_with_timestamp_basic_test
./ldb --db=<db> manifest_dump --verbose
```
The ldb should succeed and print something like:
```
...
--------------- Column family "default"  (ID 0) --------------
log number: 6
comparator: <TestComparator>, but the comparator object is not available.
...
```

Reviewed By: ltamasi

Differential Revision: D27927581

Pulled By: riversand963

fbshipit-source-id: f610b2c842187d17f575362070209ee6b74ec6d4

31435276

Add comment to DisableManualCompaction() (#8186) · 4985cea1

由 sdong 提交于 4月 21, 2021

Summary:
Add comment to DisableManualCompaction() which was missing.
Also explictly return from DBImpl::CompactRange() to avoid memtable flush when manual compaction is disabled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8186

Test Plan: Run existing unit tests.

Reviewed By: jay-zhuang

Differential Revision: D27744517

fbshipit-source-id: 449548a48905903b888dc9612bd17480f6596a71

4985cea1

Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) · 596e9008

由 Akanksha Mahajan 提交于 4月 21, 2021

Summary:
When WriteBufferManager is shared across DBs and column families
to maintain memory usage under a limit, OOMs have been observed when flush cannot
finish but writes continuously insert to memtables.
In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
this change will stall incoming writers until flush is completed and memory_usage
drops.

Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
(memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.

DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
(which adds dummy stall object to the writer's queue).
Then DB is blocked on a state State::Blocked (current write doesn't go
through). WBStallInterface object maintained by every DB instance is added to the queue of
WriteBufferManager.

If multiple DBs tries to write during this stall, they will also be
blocked when check WriteBufferManager::ShouldStall() returns true.

End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
to complete and avoid continous stalling if memory usage remains close to buffer_size.

WriterBufferManager::EndWriteStall() is called,
which removes all instances from its queue and signal them to continue.
Their state is changed to State::Running and they are unblocked. DBImpl
then signal all incoming writers of that DB to continue by calling
write_thread_.EndWriteStall() (which removes dummy stall object from the
queue).

DB instance creates WBMStallInterface which is an interface to block and
signal DBs during stall.
When DB needs to be blocked or signalled by WriteBufferManager,
state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898

Test Plan: Added a new test db/db_write_buffer_manager_test.cc

Reviewed By: anand1976

Differential Revision: D26093227

Pulled By: akankshamahajan15

fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae

596e9008

21 4月, 2021 4 次提交

Revert Ribbon starting level support from #8198 (#8212) · 95f6add7

由 Peter Dillinger 提交于 4月 20, 2021

Summary:
This partially reverts commit 10196d7e.

The problem with this change is because of important filter use cases:
FIFO compaction and SST writer. FIFO "compaction" always uses level 0 so
would only use Ribbon filters if specifically including level 0 for the
Ribbon filter policy. SST writer sets level_at_creation=-1 to indicate
unknown level, and this would be treated the same as level 0 unless
fixed.

We are keeping the part about committing to permanent schema, which is
only changes to API comments and HISTORY.md.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8212

Test Plan: CI

Reviewed By: jay-zhuang

Differential Revision: D27896468

Pulled By: pdillinger

fbshipit-source-id: 50a775f7cba5d64fb729d9b982e355864020596e

95f6add7

Cleanup include (#8208) · 2e5de5a2

由 Andrew Gallagher 提交于 4月 20, 2021

Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8208

Make include of "file_system.h" use the same include path as everywhere
else.

Reviewed By: riversand963, akankshamahajan15

Differential Revision: D27881606

fbshipit-source-id: fc1e076229fde21041a813c655ce017b5070c8b3

2e5de5a2

Fix seqno in ingested file boundary key metadata (#8209) · 905dd17b

由 Andrew Kryczka 提交于 4月 20, 2021

Summary:
Fixes https://github.com/facebook/rocksdb/issues/6245.

Adapted from https://github.com/facebook/rocksdb/issues/8201 and https://github.com/facebook/rocksdb/issues/8205.

Previously we were writing the ingested file's smallest/largest internal keys
with sequence number zero, or `kMaxSequenceNumber` in case of range
tombstone. The former (sequence number zero) is incorrect and can lead
to files being incorrectly ordered. The fix in this PR is to overwrite
boundary keys that have sequence number zero with the ingested file's assigned
sequence number.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8209

Test Plan: repro unit test

Reviewed By: riversand963

Differential Revision: D27885678

Pulled By: ajkr

fbshipit-source-id: 4a9f2c6efdfff81c3a9923e915ea88b250ee7b6a

905dd17b

Mention PR 8206 in HISTORY.md (#8210) · 1b99947e

由 Levi Tamasi 提交于 4月 20, 2021

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8210

Reviewed By: akankshamahajan15

Differential Revision: D27887612

Pulled By: ltamasi

fbshipit-source-id: 0db8d0b6047334dc47fe30a98804449043454386

1b99947e

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功