提交 · 250953112348d39a3f29c37834041ae611358264 · kvdb / rocksdb

25 10月, 2019 1 次提交

Add test showing range tombstones can create excessively large compactions (#5956) · 25095311

由 Dan Lambright 提交于 10月 24, 2019

Summary:
For more information on the original problem see this [link](https://github.com/facebook/rocksdb/issues/3977).

This change adds two new tests. They are identical other than one uses range tombstones and the other does not. Each test generates sub files at L2 which overlap with keys L3. The test that uses range tombstones generates a single file at L2. This single file will generate a very large range overlap that will in turn create excessively large compaction.

1: T001 - T005
2: 000 - 005

In contrast, the test that uses key ranges generates 3 files at L2. As a single file is compacted at a time, those 3 files will generate less work per compaction iteration.

1: 001 - 002
1: 003 - 004
1: 005
2: 000 - 005
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5956

Differential Revision: D18071631

Pulled By: dlambrig

fbshipit-source-id: 12abae75fb3e0b022d228c6371698aa5e53385df

25095311

24 10月, 2019 3 次提交

CfConsistencyStressTest to validate key consistent across CFs in TestGet() (#5863) · 9f1e5a0b

由 sdong 提交于 10月 23, 2019

Summary:
Right now in CF consitency stres test's TestGet(), keys are just fetched without validation. With this change, in 1/2 the time, compare all the CFs share the same value with the same key.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5863

Test Plan: Run "make crash_test_with_atomic_flush" and see tests pass. Hack the code to generate some inconsistency and observe the test fails as expected.

Differential Revision: D17934206

fbshipit-source-id: 00ba1a130391f28785737b677f80f366fb83cced

9f1e5a0b

Remove unused BloomFilterPolicy::hash_func_ (#5961) · 6a32e3b5

由 Peter Dillinger 提交于 10月 23, 2019

Summary:
This is an internal, file-local "feature" that is not used and
potentially confusing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5961

Test Plan: make check

Differential Revision: D18099018

Pulled By: pdillinger

fbshipit-source-id: 7870627eeed09941d12538ec55d10d2e164fc716

6a32e3b5

Make buckifier python3 compatible (#5922) · b4ebda7a

由 Yanqin Jin 提交于 10月 23, 2019

Summary:
Make buckifier/buckify_rocksdb.py run on both Python 3 and 2
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5922

Test Plan:
```
$python3 buckifier/buckify_rocksdb.py
$python3 buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}'
$python2 buckifier/buckify_rocksdb.py
$python2 buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}'
```

Differential Revision: D17920611

Pulled By: riversand963

fbshipit-source-id: cc6e2f36013a88a710d96098f6ca18cbe85e3f62

b4ebda7a

23 10月, 2019 2 次提交

Fix the potential memory leak in trace_replay (#5955) · 09333606

由 Zhichao Cao 提交于 10月 22, 2019

Summary:
In the previous PR https://github.com/facebook/rocksdb/issues/5934 , in the while loop, if/else if is used without ending with else to free the object referenced by ra, it might cause potential memory leak (warning during compiling). Fix it by changing the last "else if" to "else".
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5955

Test Plan: pass make asan check, pass the USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j64 analyze.

Differential Revision: D18071612

Pulled By: zhichao-cao

fbshipit-source-id: 51c00023d0c97c2921507254329aed55d56e1786

09333606

Use FLAGS_env for certain operations in db_bench (#5943) · c0abc6bb

由 Yanqin Jin 提交于 10月 22, 2019

Summary:
Since we already parse env_uri from command line and creates custom Env
accordingly, we should invoke the methods of such Envs instead of using
Env::Default().

Test Plan (on devserver):
```
$make db_bench db_stress
$./db_bench -benchmarks=fillseq
./db_stress
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5943

Differential Revision: D18018550

Pulled By: riversand963

fbshipit-source-id: 03b61329aaae0dfd914a0b902cc677f570f102e3

c0abc6bb

22 10月, 2019 8 次提交

Include db_stress_tool in rocksdb tools lib (#5950) · 925250f4

由 Yanqin Jin 提交于 10月 21, 2019

Summary:
include db_stress_tool in rocksdb tools lib

Test Plan (on devserver):
```
$make db_stress
$./db_stress
$make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950

Differential Revision: D18044399

Pulled By: riversand963

fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1

925250f4

Using clang for internal ubsan tests (#5952) · 5677f4f7

由 Vijay Nadimpalli 提交于 10月 21, 2019

Summary:
Using clang for internal ubsan tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5952

Differential Revision: D18048810

Pulled By: vjnadimpalli

fbshipit-source-id: ae55677a1928397b067e972d0ecb4ac1b7e2c8dc

5677f4f7

Fix memory leak on error opening PlainTable (#5951) · 27a12457

由 Peter Dillinger 提交于 10月 21, 2019

Summary:
Several error paths in opening of a plain table would leak memory. PR https://github.com/facebook/rocksdb/issues/5940 opened the leak to one more error path, which happens to have been (mistakenly) exercised by CuckooTableDBTest.AdaptiveTable. That test has been fixed, and the exercising of
plain table error cases (more than before) has been added as BadOptions1 and BadOptions2
to PlainTableDBTest. This effectively moved the memory leak to plain_table_db_test.

Also here is a cheap fix for the memory leak, without (yet?) changing the signature of
ReadTableProperties. This fixes ASAN on unit tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5951

Test Plan: make COMPILE_WITH_ASAN=1 check

Differential Revision: D18051940

Pulled By: pdillinger

fbshipit-source-id: e2952930c09a2b46c4f1ff09818c5090426929de

27a12457

Fix the potential memory leak of ReplayMultiThread (#5949) · 7245fb5f

由 Zhichao Cao 提交于 10月 21, 2019

Summary:
The pointer ra needs to be freed the status s returns not OK. In the previous PR https://github.com/facebook/rocksdb/issues/5934 , the ra is not freed which might cause potential memory leak. Fix this issue by moving the clarification of ra inside the while loop and freeing it as desired.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5949

Test Plan: pass make asan check.

Differential Revision: D18045726

Pulled By: zhichao-cao

fbshipit-source-id: d5445b7b832c8bb1dafe008bafea7bfe9eb0b1ce

7245fb5f

Making platform 007 (gcc 7) default in build_detect_platform.sh (#5947) · 2ce6aa5f

由 Vijay Nadimpalli 提交于 10月 21, 2019

Summary:
Making platform 007 (gcc 7) default in build_detect_platform.sh.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5947

Differential Revision: D18038837

Pulled By: vjnadimpalli

fbshipit-source-id: 9ac2ddaa93bf328a416faec028970e039886378e

2ce6aa5f

LevelIterator to avoid gap after prefix bloom filters out a file (#5861) · a0cd9200

由 sdong 提交于 10月 21, 2019

Summary:
Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5861

Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test

Differential Revision: D17918213

fbshipit-source-id: f06b47d937c7cc8919001f18dcc3af5b28c9cdac

a0cd9200

Fix VerifyChecksum readahead with mmap mode (#5945) · 30e2dc02

由 sdong 提交于 10月 21, 2019

Summary:
A recent change introduced readahead inside VerifyChecksum(). However it is not compatible with mmap mode and generated wrong checksum verification failure. Fix it by not enabling readahead in mmap
 mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5945

Test Plan: Add a unit test that used to fail.

Differential Revision: D18021443

fbshipit-source-id: 6f2eb600f81b26edb02222563a4006869d576bff

30e2dc02

Fix some dependency paths (#5946) · 1a21afa7

由 sdong 提交于 10月 21, 2019

Summary:
Some dependency path is not correct so that ASAN cannot run with CLANG. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5946

Test Plan: Run ASAN with CLANG

Differential Revision: D18040933

fbshipit-source-id: 1d82be9d350485cf1df1c792dad765188958641f

1a21afa7

19 10月, 2019 7 次提交

Store the filter bits reader alongside the filter block contents (#5936) · 29ccf207

由 Levi Tamasi 提交于 10月 18, 2019

Summary:
Amongst other things, PR https://github.com/facebook/rocksdb/issues/5504 refactored the filter block readers so that
only the filter block contents are stored in the block cache (as opposed to the
earlier design where the cache stored the filter block reader itself, leading to
potentially dangling pointers and concurrency bugs). However, this change
introduced a performance hit since with the new code, the metadata fields are
re-parsed upon every access. This patch reunites the block contents with the
filter bits reader to eliminate this overhead; since this is still a self-contained
pure data object, it is safe to store it in the cache. (Note: this is similar to how
the zstd digest is handled.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5936

Test Plan:
make asan_check

filter_bench results for the old code:

```
$ ./filter_bench -quick
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 26.7153
Number of filters: 16669
Total memory (MB): 200.009
Bits/key actual: 10.0647
----------------------------
Inside queries...
  Dry run (46b) ns/op: 33.4258
  Single filter ns/op: 42.5974
  Random filter ns/op: 217.861
----------------------------
Outside queries...
  Dry run (25d) ns/op: 32.4217
  Single filter ns/op: 50.9855
  Random filter ns/op: 219.167
    Average FP rate %: 1.13993
----------------------------
Done. (For more info, run with -legend or -help.)

$ ./filter_bench -quick -use_full_block_reader
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 26.5172
Number of filters: 16669
Total memory (MB): 200.009
Bits/key actual: 10.0647
----------------------------
Inside queries...
  Dry run (46b) ns/op: 32.3556
  Single filter ns/op: 83.2239
  Random filter ns/op: 370.676
----------------------------
Outside queries...
  Dry run (25d) ns/op: 32.2265
  Single filter ns/op: 93.5651
  Random filter ns/op: 408.393
    Average FP rate %: 1.13993
----------------------------
Done. (For more info, run with -legend or -help.)
```

With the new code:

```
$ ./filter_bench -quick
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 25.4285
Number of filters: 16669
Total memory (MB): 200.009
Bits/key actual: 10.0647
----------------------------
Inside queries...
  Dry run (46b) ns/op: 31.0594
  Single filter ns/op: 43.8974
  Random filter ns/op: 226.075
----------------------------
Outside queries...
  Dry run (25d) ns/op: 31.0295
  Single filter ns/op: 50.3824
  Random filter ns/op: 226.805
    Average FP rate %: 1.13993
----------------------------
Done. (For more info, run with -legend or -help.)

$ ./filter_bench -quick -use_full_block_reader
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 26.5308
Number of filters: 16669
Total memory (MB): 200.009
Bits/key actual: 10.0647
----------------------------
Inside queries...
  Dry run (46b) ns/op: 33.2968
  Single filter ns/op: 58.6163
  Random filter ns/op: 291.434
----------------------------
Outside queries...
  Dry run (25d) ns/op: 32.1839
  Single filter ns/op: 66.9039
  Random filter ns/op: 292.828
    Average FP rate %: 1.13993
----------------------------
Done. (For more info, run with -legend or -help.)
```

Differential Revision: D17991712

Pulled By: ltamasi

fbshipit-source-id: 7ea205550217bfaaa1d5158ebd658e5832e60f29

29ccf207

Fix TestIterate for HashSkipList in db_stress (#5942) · c53db172

由 Yanqin Jin 提交于 10月 18, 2019

Summary:
Since SeekForPrev (used by Prev) is not supported by HashSkipList when prefix is used, we disable it when stress testing HashSkipList.

- Change the default memtablerep to skip list.
- Avoid Prev() when memtablerep is HashSkipList and prefix is used.

Test Plan (on devserver):
```
$make db_stress
$./db_stress -ops_per_thread=10000 -reopen=1 -destroy_db_initially=true -column_families=1 -threads=1 -column_families=1 -memtablerep=prefix_hash
$# or simply
$./db_stress
$./db_stress -memtablerep=prefix_hash
```
Results must print "Verification successful".
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5942

Differential Revision: D18017062

Pulled By: riversand963

fbshipit-source-id: af867e59aa9e6f533143c984d7d529febf232fd7

c53db172

Refactor / clean up / optimize FullFilterBitsReader (#5941) · 5f8f2fda

由 Peter Dillinger 提交于 10月 18, 2019

Summary:
FullFilterBitsReader, after creating in BloomFilterPolicy, was
responsible for decoding metadata bits. This meant that
FullFilterBitsReader::MayMatch had some metadata checks in order to
implement "always true" or "always false" functionality in the case
of inconsistent or trivial metadata. This made for ugly
mixing-of-concerns code and probably had some runtime cost. It also
didn't really support plugging in alternative filter implementations
with extensions to the existing metadata schema.

BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible
for decoding filter metadata bits and constructing appropriate instances
deriving from FilterBitsReader. "Always false" and "always true" derived
classes allow FullFilterBitsReader not to be concerned with handling of
trivial or inconsistent metadata. This also makes for easy expansion
to alternative filter implementations in new, alternative derived
classes. This change makes calls to FilterBitsReader::MayMatch
*necessarily* virtual because there's now more than one built-in
implementation. Compared with the previous implementation's extra
'if' checks in MayMatch, there's no consistent performance difference,
measured by (an older revision of) filter_bench (differences here seem
to be within noise):

Inside queries...
- Dry run (407) ns/op: 35.9996
+ Dry run (407) ns/op: 35.2034
- Single filter ns/op: 47.5483
+ Single filter ns/op: 47.4034
- Batched, prepared ns/op: 43.1559
+ Batched, prepared ns/op: 42.2923
...
- Random filter ns/op: 150.697
+ Random filter ns/op: 149.403
----------------------------
Outside queries...
- Dry run (980) ns/op: 34.6114
+ Dry run (980) ns/op: 34.0405
- Single filter ns/op: 56.8326
+ Single filter ns/op: 55.8414
- Batched, prepared ns/op: 48.2346
+ Batched, prepared ns/op: 47.5667
- Random filter ns/op: 155.377
+ Random filter ns/op: 153.942
Average FP rate %: 1.1386

Also, the FullFilterBitsReader ctor was responsible for a surprising
amount of CPU in production, due in part to inefficient determination of
the CACHE_LINE_SIZE used to construct the filter being read. The
overwhelming common case (same as my CACHE_LINE_SIZE) is now
substantially optimized, as shown with filter_bench with
-new_reader_every=1 (old option - see below) (repeatable result):

Inside queries...
- Dry run (453) ns/op: 118.799
+ Dry run (453) ns/op: 105.869
- Single filter ns/op: 82.5831
+ Single filter ns/op: 74.2509
...
- Random filter ns/op: 224.936
+ Random filter ns/op: 194.833
----------------------------
Outside queries...
- Dry run (aa1) ns/op: 118.503
+ Dry run (aa1) ns/op: 104.925
- Single filter ns/op: 90.3023
+ Single filter ns/op: 83.425
...
- Random filter ns/op: 220.455
+ Random filter ns/op: 175.7
Average FP rate %: 1.13886

However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse.

Also fixed inadequate check of consistency between filter data size and
num_lines. (Unit test updated.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941

Test Plan:
previously added unit tests FullBloomTest.CorruptFilters and
FullBloomTest.RawSchema

Differential Revision: D18018353

Pulled By: pdillinger

fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c

5f8f2fda

Fix PlainTableReader not to crash sst_dump (#5940) · fe464bca

由 Peter Dillinger 提交于 10月 18, 2019

Summary:
Plain table SSTs could crash sst_dump because of a bug in
PlainTableReader that can leave table_properties_ as null. Even if it
was intended not to keep the table properties in some cases, they were
leaked on the offending code path.

Steps to reproduce:

    $ db_bench --benchmarks=fillrandom --num=2000000 --use_plain_table --prefix-size=12
    $ sst_dump --file=0000xx.sst --show_properties
    from [] to []
    Process /dev/shm/dbbench/000014.sst
    Sst file format: plain table
    Raw user collected properties
    ------------------------------
    Segmentation fault (core dumped)

Also added missing unit testing of plain table full_scan_mode, and
an assertion in NewIterator to check for regression.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5940

Test Plan: new unit test, manual, make check

Differential Revision: D18018145

Pulled By: pdillinger

fbshipit-source-id: 4310c755e824c4cd6f3f86a3abc20dfa417c5e07

fe464bca

Enable trace_replay with multi-threads (#5934) · 526e3b97

由 Zhichao Cao 提交于 10月 18, 2019

Summary:
In the current trace replay, all the queries are serialized and called by single threads. It may not simulate the original application query situations closely. The multi-threads replay is implemented in this PR. Users can set the number of threads to replay the trace. The queries generated according to the trace records are scheduled in the thread pool job queue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5934

Test Plan: test with make check and real trace replay.

Differential Revision: D17998098

Pulled By: zhichao-cao

fbshipit-source-id: 87eecf6f7c17a9dc9d7ab29dd2af74f6f60212c8

526e3b97

Update HISTORY.md with recent BlobDB adjacent changes · 69bd8a28

由 Levi Tamasi 提交于 10月 18, 2019

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5939

Differential Revision: D18009096

Pulled By: ltamasi

fbshipit-source-id: 032a48a302f9da38aecf4055b5a8d4e1dffd9dc7

69bd8a28

Expose db stress tests (#5937) · e60cc092

由 Yanqin Jin 提交于 10月 18, 2019

Summary:
expose db stress test by providing db_stress_tool.h in public header.
This PR does the following:
- adds a new header, db_stress_tool.h, in include/rocksdb/
- renames db_stress.cc to db_stress_tool.cc
- adds a db_stress.cc which simply invokes a test function.
- update Makefile accordingly.

Test Plan (dev server):
```
make db_stress
./db_stress
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5937

Differential Revision: D17997647

Pulled By: riversand963

fbshipit-source-id: 1a8d9994f89ce198935566756947c518f0052410

e60cc092

18 10月, 2019 1 次提交

Support decoding blob indexes in sst_dump (#5926) · fdc1cb43

由 Levi Tamasi 提交于 10月 17, 2019

Summary:
The patch adds a new command line parameter --decode_blob_index to sst_dump.
If this switch is specified, sst_dump prints blob indexes in a human readable format,
printing the blob file number, offset, size, and expiration (if applicable) for blob
references, and the blob value (and expiration) for inlined blobs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5926

Test Plan:
Used db_bench's BlobDB mode to generate SST files containing blob references with
and without expiration, as well as inlined blobs with and without expiration (note: the
latter are stored as plain values), and confirmed sst_dump correctly prints all four types
of records.

Differential Revision: D17939077

Pulled By: ltamasi

fbshipit-source-id: edc5f58fee94ba35f6699c6a042d5758f5b3963d

fdc1cb43

17 10月, 2019 1 次提交

Fix OnFlushCompleted fired before flush result write to MANIFEST (#5908) · 1f9d7c0f

由 Yi Wu 提交于 10月 16, 2019

Summary:
When there are concurrent flush job on the same CF, `OnFlushCompleted` can be called before the flush result being install to LSM. Fixing the issue by passing `FlushJobInfo` through `MemTable`, and the thread who commit the flush result can fetch the `FlushJobInfo` and fire `OnFlushCompleted` on behave of the thread actually writing the SST.

Fix https://github.com/facebook/rocksdb/issues/5892
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5908

Test Plan: Add new test. The test will fail without the fix.

Differential Revision: D17916144

Pulled By: riversand963

fbshipit-source-id: e18df67d9533b5baee52ae3605026cdeb05cbe10

1f9d7c0f

16 10月, 2019 1 次提交

Update HISTORY for SeekForPrev bug fix (#5925) · 2c9e9f2a

由 Maysam Yabandeh 提交于 10月 16, 2019

Summary:
Update history for the bug fix in https://github.com/facebook/rocksdb/pull/5907
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5925

Differential Revision: D17952605

Pulled By: maysamyabandeh

fbshipit-source-id: 609afcbb2e4087f9153822c4d11193a75a7b0e7a

2c9e9f2a

15 10月, 2019 8 次提交

Fix clang analyzer error (#5924) · 5ef27dea

由 Yanqin Jin 提交于 10月 14, 2019

Summary:
Without this PR, clang analyzer complains.
```
$USE_CLANG=1 make analyze
db/compaction/compaction_job_test.cc:161:20: warning: The left operand of '==' is a garbage value
      if (key.type == kTypeBlobIndex) {
                ~~~~~~~~ ^
                1 warning generated.
```

Test Plan (on devserver)
```
$USE_CLANG=1 make analyze
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5924

Differential Revision: D17923226

Pulled By: riversand963

fbshipit-source-id: 9d1eb769b5e0de7cb3d89dc90d1cfa895db7fdc8

5ef27dea

Support non-TTL Puts for BlobDB in db_bench (#5921) · 78b28d80

由 Levi Tamasi 提交于 10月 14, 2019

Summary:
Currently, db_bench only supports PutWithTTL operations for BlobDB but
not regular Puts. The patch adds support for regular (non-TTL) Puts and also
changes the default for blob_db_max_ttl_range to zero, which corresponds
to no TTL.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5921

Test Plan:
make check

./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1
-duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000
-target_file_size_base=1000000 (issues Put operations with no TTL)

./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1
-duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000
-target_file_size_base=1000000 -blob_db_max_ttl_range=86400 (issues
PutWithTTL operations with random TTLs in the [0, blob_db_max_ttl_range)
interval, as before)

Differential Revision: D17919798

Pulled By: ltamasi

fbshipit-source-id: b946c3522b836b92b4c157ffbad24f92ba2b0a16

78b28d80

bloom_test.cc: include <array> (#5920) · 93edd51c

由 Peter Dillinger 提交于 10月 14, 2019

Summary:
Fix build failure on some platforms, reported in issue https://github.com/facebook/rocksdb/issues/5914
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5920

Test Plan: make bloom_test && ./bloom_test

Differential Revision: D17918328

Pulled By: pdillinger

fbshipit-source-id: b822004d4442de0171db2aeff433677783f7b94e

93edd51c

BlobDB GC: add SST <-> oldest blob file referenced mapping (#5903) · 5f025ea8

由 Levi Tamasi 提交于 10月 14, 2019

Summary:
This is groundwork for adding garbage collection support to BlobDB. The
patch adds logic that keeps track of the oldest blob file referred to by
each SST file. The oldest blob file is identified during flush/
compaction (similarly to how the range of keys covered by the SST is
identified), and persisted in the manifest as a custom field of the new
file edit record. Blob indexes with TTL are ignored for the purposes of
identifying the oldest blob file (since such blob files are cleaned up by the
TTL logic in BlobDB).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5903

Test Plan:
Added new unit tests; also ran db_bench in BlobDB mode, inspected the
manifest using ldb, and confirmed (by scanning the SST files using
sst_dump) that the value of the oldest blob file number field matches
the contents of the file for each SST.

Differential Revision: D17859997

Pulled By: ltamasi

fbshipit-source-id: 21662c137c6259a6af70446faaf3a9912c550e90

5f025ea8

Move blob_index.h to db/ (#5919) · a59dc843

由 Levi Tamasi 提交于 10月 14, 2019

Summary:
Extracted from PR https://github.com/facebook/rocksdb/issues/5903 for technical reasons.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5919

Test Plan: make check

Differential Revision: D17910132

Pulled By: ltamasi

fbshipit-source-id: 6ecbb8d6e84b2a1d1f28575ad48ac3cc65833eb5

a59dc843

Add Env::SanitizeEnvOptions (#5885) · 231fffd0

由 Yanqin Jin 提交于 10月 14, 2019

Summary:
Add Env::SanitizeEnvOptions to allow underlying environments properly
configure env options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5885

Test Plan:
```
make check
```

Differential Revision: D17910327

Pulled By: riversand963

fbshipit-source-id: 86a1ac616e485742c35c4a9cc9f1227c529fc00f

231fffd0

Enable partitioned index/filter in stress tests (#5918) · a6e615a7

由 Maysam Yabandeh 提交于 10月 14, 2019

Summary:
This is the 3rd attempt after the revert of https://github.com/facebook/rocksdb/issues/4020 and https://github.com/facebook/rocksdb/issues/5895
The last bug is fixed https://github.com/facebook/rocksdb/pull/5907
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5918

Test Plan:
```
make -j32 crash_test
```

Differential Revision: D17909489

Pulled By: maysamyabandeh

fbshipit-source-id: 7dfb8cf998c2d295c86465dd21734593d277887e

a6e615a7

OnTableFileCreationCompleted use "(nil)" for empty file during flush (#5905) · 6febfd84

由 Yanqin Jin 提交于 10月 14, 2019

Summary:
Compaction can call OnTableFileCreationCompleted(). If file is empty, "(nil)"
is used as the file name.
Do the same for flush.

Test plan (dev server):
```
make all
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5905

Differential Revision: D17883285

Pulled By: riversand963

fbshipit-source-id: 6565884adbb00e8023d88b17dfb3b6eb92220b59

6febfd84

12 10月, 2019 2 次提交

Fix SeekForPrev bug with Partitioned Filters and Prefix (#5907) · 4e729f90

由 Maysam Yabandeh 提交于 10月 11, 2019

Summary:
Partition Filters make use of a top-level index to find the partition that might have the bloom hash of the key. The index is with internal key format (before format version 3). Each partition contains the i) blooms of the keys in that range ii) bloom of prefixes of keys in that range, iii) the bloom of the prefix of the last key in the previous partition.
When ::SeekForPrev(key), we first perform a prefix bloom test on the SST file. The partition however is identified using the full internal key, rather than the prefix key. The reason is to be compatible with the internal key format of the top-level index. This creates a corner case. Example:
- SST k, Partition N: P1K1, P1K2
- SST k, top-level index: P1K2
- SST k+1, Partition 1: P2K1, P3K1
- SST k+1 top-level index: P3K1
When SeekForPrev(P1K3), it should point us to P1K2. However SST k top-level index would reject P1K3 since it is out of range.
One possible fix would be to search with the prefix P1 (instead of full internal key P1K3) however the details of properly comparing prefix with full internal key might get complicated. The fix we apply in this PR is to look into the last partition anyway even if the key is out of range.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5907

Differential Revision: D17889918

Pulled By: maysamyabandeh

fbshipit-source-id: 169fd7b3c71dbc08808eae5a8340611ebe5bdc1e

4e729f90

Fix block cache ID uniqueness for Windows builds (#5844) · b00761ee

由 Andrew Kryczka 提交于 10月 11, 2019

Summary:
Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.

We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844

Test Plan:
we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change

Differential Revision: D17752442

fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00

b00761ee

11 10月, 2019 4 次提交

Revert "Enable partitioned index/filter in stress tests (#5895)" (#5904) · bc8b05cb

由 Yanqin Jin 提交于 10月 10, 2019

Summary:
This reverts commit 2f4e2881.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5904

Differential Revision: D17871282

Pulled By: riversand963

fbshipit-source-id: d210725f8f3b26d8eac25892094da09d9694337e

bc8b05cb

Remove a webhook due to potential security concern (#5902) · ddb62d1f

由 Yanqin Jin 提交于 10月 10, 2019

Summary:
As title.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5902

Differential Revision: D17858150

Pulled By: riversand963

fbshipit-source-id: db2cd8a756faf7b9751b2651a22e1b29ca9fecec

ddb62d1f

Fix the rocksjava release Vagrant build on CentOS (#5901) · 1e9c8d42

由 Adam Retter 提交于 10月 10, 2019

Summary:
Closes https://github.com/facebook/rocksdb/issues/5873
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5901

Differential Revision: D17869585

fbshipit-source-id: 559472486f1d3ac80c0c7df6c421c4b612b9b7f9

1e9c8d42

MultiGet batching in memtable (#5818) · 4c49e38f

由 Vijay Nadimpalli 提交于 10月 10, 2019

Summary:
RocksDB has a MultiGet() API that implements batched key lookup for higher performance (https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L468). Currently, batching is implemented in BlockBasedTableReader::MultiGet() for SST file lookups. One of the ways it improves performance is by pipelining bloom filter lookups (by prefetching required cachelines for all the keys in the batch, and then doing the probe) and thus hiding the cache miss latency. The same concept can be extended to the memtable as well. This PR involves implementing a pipelined bloom filter lookup in DynamicBloom, and implementing MemTable::MultiGet() that can leverage it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5818

Test Plan:
Existing tests

Performance Test:
Ran the below command which fills up the memtable and makes sure there are no flushes and then call multiget. Ran it on master and on the new change and see atleast 1% performance improvement across all the test runs I did. Sometimes the improvement was upto 5%.

TEST_TMPDIR=/data/users/$USER/benchmarks/feature/ numactl -C 10 ./db_bench -benchmarks="fillseq,multireadrandom" -num=600000 -compression_type="none" -level_compaction_dynamic_level_bytes -write_buffer_size=200000000 -target_file_size_base=200000000 -max_bytes_for_level_base=16777216 -reads=90000 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 -statistics -memtable_whole_key_filtering=true -memtable_bloom_size_ratio=10

Differential Revision: D17578869

Pulled By: vjnadimpalli

fbshipit-source-id: 23dc651d9bf49db11d22375bf435708875a1f192

4c49e38f

10 10月, 2019 1 次提交

Make the db_stress reopen loop in OperateDb() more robust (#5893) · 80ad996b

由 anand76 提交于 10月 09, 2019

Summary:
The loop in OperateDb() is getting quite complicated with the introduction of multiple key operations such as MultiGet and Reseeks. This is resulting in a number of corner cases that hangs db_stress due to synchronization problems during reopen (i.e when -reopen=<> option is specified). This PR makes it more robust by ensuring all db_stress threads vote to reopen the DB the exact same number of times.
Most of the changes in this diff are due to indentation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5893

Test Plan: Run crash test

Differential Revision: D17823827

Pulled By: anand1976

fbshipit-source-id: ec893829f611ac7cac4057c0d3d99f9ffb6a6dd9

80ad996b

09 10月, 2019 1 次提交

Remove deprecated RocksDBCommonHelper and cont_integration.sh (#5889) · 5b123813

由 katherine 提交于 10月 09, 2019

Summary:
As titled. RocksDBCommonHelper contains references to legacy APIs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5889

Differential Revision: D17783179

fbshipit-source-id: dcde82a73a311bfa3300ad69189b3a32727134d1

5b123813

kvdb / rocksdb 11 个月 前同步成功

kvdb / rocksdb
11 个月前同步成功