提交 · 147dfc7bdfca8a5ca544097d08460e045aed89f8 · kvdb / rocksdb

06 4月, 2018 3 次提交

Fix pre_release callback argument list. · 147dfc7b

由 Dmitri Smirnov 提交于 4月 05, 2018

Summary:
Primitive types constness does not affect the signature of the
  method and has no influence on whether the overriding method would
  actually have that const bool instead of just bool. In addition,
  it is rarely useful but does produce a compatibility warnings
  in VS 2015 compiler.
Closes https://github.com/facebook/rocksdb/pull/3663

Differential Revision: D7475739

Pulled By: ajkr

fbshipit-source-id: fb275378b5acc397399420ae6abb4b6bfe5bd32f

147dfc7b

Blob DB: blob_dump to show uncompressed values · 36a9f229

由 Yi Wu 提交于 4月 05, 2018

Summary:
Make blob_dump tool able to show uncompressed values if the blob file is compressed. Also show total compressed vs. raw size at the end if --show_summary is provided.
Closes https://github.com/facebook/rocksdb/pull/3633

Differential Revision: D7348926

Pulled By: yiwu-arbug

fbshipit-source-id: ca709cb4ed5cf6a550ff2987df8033df81516f8e

36a9f229

fix build for rocksdb lite · c827b2dc

由 Zhongyi Xie 提交于 4月 05, 2018

Summary:
currently rocksdb lite build fails due to the following errors:
> db/db_sst_test.cc:29:51: error: ‘FlushJobInfo’ does not name a type
   virtual void OnFlushCompleted(DB* /*db*/, const FlushJobInfo& info) override {
                                                   ^
db/db_sst_test.cc:29:16: error: ‘virtual void rocksdb::FlushedFileCollector::OnFlushCompleted(rocksdb::DB*, const int&)’ marked ‘override’, but does not override
   virtual void OnFlushCompleted(DB* /*db*/, const FlushJobInfo& info) override {
                ^
db/db_sst_test.cc:24:7: error: ‘class rocksdb::FlushedFileCollector’ has virtual functions and accessible non-virtual destructor [-Werror=non-virtual-dtor]
 class FlushedFileCollector : public EventListener {
       ^
db/db_sst_test.cc: In member function ‘virtual void rocksdb::FlushedFileCollector::OnFlushCompleted(rocksdb::DB*, const int&)’:
db/db_sst_test.cc:31:35: error: request for member ‘file_path’ in ‘info’, which is of non-class type ‘const int’
     flushed_files_.push_back(info.file_path);
                                   ^
cc1plus: all warnings being treated as errors
make: *** [db/db_sst_test.o] Error 1
Closes https://github.com/facebook/rocksdb/pull/3676

Differential Revision: D7493006

Pulled By: miasantreble

fbshipit-source-id: 77dff0a5b23e27db51be9b9798e3744e6fdec64f

c827b2dc

05 4月, 2018 1 次提交

Ttl-triggered and snapshot-release-triggered compactions should not be manual compactions · 7d906799

由 Sagar Vemuri 提交于 4月 05, 2018

Summary:
Ttl-triggered and snapshot-release-triggered compactions should not be considered as manual compactions. This is a bug.
Closes https://github.com/facebook/rocksdb/pull/3678

Differential Revision: D7498151

Pulled By: sagar0

fbshipit-source-id: a2d5bed05268a4dc93d54ea97a9ae44b366df15d

7d906799

04 4月, 2018 2 次提交

Make Optimistic Tx database stackable · 2a62ca17

由 Dmitri Smirnov 提交于 4月 03, 2018

Summary:
This change models Optimistic Tx db after Pessimistic TX db. The motivation for this change is to make the ptr polymorphic so it can be held by the same raw or smart ptr.

Currently, due to the inheritance of the Opt Tx db not being rooted in the manner of Pess Tx from a single DB root it is more difficult to write clean code and have clear ownership of the database in cases when options dictate instantiate of plan DB, Pess Tx DB or Opt tx db.
Closes https://github.com/facebook/rocksdb/pull/3566

Differential Revision: D7184502

Pulled By: yiwu-arbug

fbshipit-source-id: 31d06efafd79497bb0c230e971857dba3bd962c3

2a62ca17

Reduce default --nooverwritepercent in black-box crash tests · b058a337

由 Andrew Kryczka 提交于 4月 03, 2018

Summary:
Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`.

Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds.
Closes https://github.com/facebook/rocksdb/pull/3671

Differential Revision: D7457732

Pulled By: ajkr

fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0

b058a337

03 4月, 2018 6 次提交

Some small improvements to the build_tools · 12b400e8

由 Adam Retter 提交于 4月 02, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3664

Differential Revision: D7459433

Pulled By: sagar0

fbshipit-source-id: 3817e5d45fc70e83cb26f9800eaa0f4566c8dc0e

12b400e8

Level Compaction with TTL · 04c11b86

由 Sagar Vemuri 提交于 4月 02, 2018

Summary:
Level Compaction with TTL.

As of today, a file could exist in the LSM tree without going through the compaction process for a really long time if there are no updates to the data in the file's key range. For example, in certain use cases, the keys are not actually "deleted"; instead they are just set to empty values. There might not be any more writes to this "deleted" key range, and if so, such data could remain in the LSM for a really long time resulting in wasted space.

Introducing a TTL could solve this problem. Files (and, in turn, data) older than TTL will be scheduled for compaction when there is no other background work. This will make the data go through the regular compaction process and get rid of old unwanted data.
This also has the (good) side-effect of all the data in the non-bottommost level being newer than ttl, and all data in the bottommost level older than ttl. It could lead to more writes while reducing space.

This functionality can be controlled by the newly introduced column family option -- ttl.

TODO for later:
- Make ttl mutable
- Extend TTL to Universal compaction as well? (TTL is already supported in FIFO)
- Maybe deprecate CompactionOptionsFIFO.ttl in favor of this new ttl option.
Closes https://github.com/facebook/rocksdb/pull/3591

Differential Revision: D7275442

Pulled By: sagar0

fbshipit-source-id: dcba484717341200d419b0953dafcdf9eb2f0267

04c11b86

Fix 3-way SSE4.2 crc32c usage in MSVC with CMake · df144244

由 Koby Kahane 提交于 4月 02, 2018

Summary:
The introduction of the 3-way SSE4.2 optimized crc32c implementation in commit f54d7f5f added the `HAVE_PCLMUL` definition when the compiler supports intrinsics for that instruction, but did not modify CMakeLists.txt to set that definition on MSVC when appropriate. As a result, 3-way SSE4.2 is not used in MSVC builds with CMake although it could be.

Since the existing test program in CMakeLists.txt for `HAVE_SSE42` already uses `_mm_clmulepi64_si128` which is a PCLMUL instruction, this PR sets `HAVE_PCLMUL` as well if that program builds successfully, fixing the problem.
Closes https://github.com/facebook/rocksdb/pull/3673

Differential Revision: D7473975

Pulled By: miasantreble

fbshipit-source-id: bc346b9eb38920e427aa1a253e6dd9811efa269e

df144244

WritePrepared Txn: smallest_prepare optimization · b225de7e

由 Maysam Yabandeh 提交于 4月 02, 2018

Summary:
The is an optimization to reduce lookup in the CommitCache when querying IsInSnapshot. The optimization takes the smallest uncommitted data at the time that the snapshot was taken and if the sequence number of the read data is lower than that number it assumes the data as committed.
To implement this optimization two changes are required: i) The AddPrepared function must be called sequentially to avoid out of order insertion in the PrepareHeap (otherwise the top of the heap does not indicate the smallest prepare in future too), ii) non-2PC transactions also call AddPrepared if they do not commit in one step.
Closes https://github.com/facebook/rocksdb/pull/3649

Differential Revision: D7388630

Pulled By: maysamyabandeh

fbshipit-source-id: b79506238c17467d590763582960d4d90181c600

b225de7e

Enable cancelling manual compactions if they hit the sfm size limit · 1579626d

由 Amy Tai 提交于 4月 02, 2018

Summary:
Manual compactions should be cancelled, just like scheduled compactions are cancelled, if sfm->EnoughRoomForCompaction is not true.
Closes https://github.com/facebook/rocksdb/pull/3670

Differential Revision: D7457683

Pulled By: amytai

fbshipit-source-id: 669b02fdb707f75db576d03d2c818fb98d1876f5

1579626d

Revert "Avoid adding tombstones of the same file to RangeDelAggregato… · 44653c7b

由 Zhongyi Xie 提交于 4月 02, 2018

Summary:
…r multiple times"

This reverts commit e80709a3.

lingbin PR https://github.com/facebook/rocksdb/pull/3635 is causing some performance regression for seekrandom workloads
I'm reverting the commit for now but feel free to submit new patches 😃

To reproduce the regression, you can run the following db_bench command
> ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none

write stats printed by db_bench:

Table | | | | | | | | | | |
 --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---
revert commit | Percentiles: | P50: | 80.77  | P75: |102.94  |P99: | 1786.44 | P99.9: | 1892.39 |P99.99: 2645.10 |
keep commit | Percentiles: | P50: | 221.72 | P75: | 686.62 | P99: | 1842.57 | P99.9: | 1899.70|  P99.99: 2814.29|
Closes https://github.com/facebook/rocksdb/pull/3672

Differential Revision: D7463315

Pulled By: miasantreble

fbshipit-source-id: 8e779c87591127f2c3694b91a56d9b459011959d

44653c7b

02 4月, 2018 1 次提交

Fixed small typos · 8917eee9

由 Adam Retter 提交于 4月 01, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3667

Differential Revision: D7470060

Pulled By: miasantreble

fbshipit-source-id: 8e8545cda38f0805f35ccdb8841666a2d7a965f5

8917eee9

31 3月, 2018 4 次提交

Throw NoSpace instead of IOError when out of space. · d12112d0

由 Fosco Marotto 提交于 3月 30, 2018

Summary:
Replaces #1702 and is updated from feedback.
Closes https://github.com/facebook/rocksdb/pull/3531

Differential Revision: D7457395

Pulled By: gfosco

fbshipit-source-id: 25a21dd8cfa5a6e42e024208b444d9379d920c82

d12112d0

Update buckifier and TARGETS · d9bfb35d

由 Fosco Marotto 提交于 3月 30, 2018

Summary:
Some flags used via make were not applied in the buckifier/targets file, causing some failures to be missed by testing infra ( ie the one fixed by #3434 )
Closes https://github.com/facebook/rocksdb/pull/3452

Differential Revision: D7457419

Pulled By: gfosco

fbshipit-source-id: e4aed2915ca3038c1485bbdeebedfc33d5704a49

d9bfb35d

Update 64-bit shift in compression.h · c3eb762b

由 Fosco Marotto 提交于 3月 30, 2018

Summary:
This was failing the build on windows with zstd, warning treated as an error, 32-bit shift implicitly converted to 64-bit.
Closes https://github.com/facebook/rocksdb/pull/3624

Differential Revision: D7307883

Pulled By: gfosco

fbshipit-source-id: 68110e9b5b1b59b668dec6cf86b67556402574e7

c3eb762b

Skip deleted WALs during recovery · 73f21a7b

由 Maysam Yabandeh 提交于 3月 30, 2018

Summary:
This patch record the deleted WAL numbers in the manifest to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
Closes https://github.com/facebook/rocksdb/pull/3488

Differential Revision: D6967893

Pulled By: maysamyabandeh

fbshipit-source-id: 13119feb155a08ab6d4909f437c7a750480dc8a1

73f21a7b

30 3月, 2018 1 次提交

WritePrepared Txn: fix a bug in publishing recoverable state seq · 89d989ed

由 Maysam Yabandeh 提交于 3月 29, 2018

Summary:
When using two_write_queue, the published seq and the last allocated sequence could be ahead of the LastSequence, even if both write queues are stopped as in WriteRecoverableState. The patch fixes a bug in WriteRecoverableState in which LastSequence was used as a reference but the result was applied to last fetched sequence and last published seq.
Closes https://github.com/facebook/rocksdb/pull/3665

Differential Revision: D7446099

Pulled By: maysamyabandeh

fbshipit-source-id: 1449bed9aed8e9db6af85946efd347cb8efd3c0b

89d989ed

29 3月, 2018 3 次提交

Allow rocksdbjavastatic to also be built as debug build · 3cb59195

由 Adam Retter 提交于 3月 28, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3654

Differential Revision: D7417948

Pulled By: sagar0

fbshipit-source-id: 9514df9328181e54a6384764444c0c7ce66e7f5f

3cb59195

WritePrepared Txn: make recoverable state visible after flush · 0377ff9d

由 Maysam Yabandeh 提交于 3月 28, 2018

Summary:
Currently if the CommitTimeWriteBatch is set to be used only as a state that is required only for recovery , the user cannot see that in DB until it is restarted. This while the state is already inserted into the DB after the memtable flush. It would be useful for debugging if make this state visible to the user after the flush by committing it. The patch does it by a invoking a callback that does the commit on the recoverable state.
Closes https://github.com/facebook/rocksdb/pull/3661

Differential Revision: D7424577

Pulled By: maysamyabandeh

fbshipit-source-id: 137f9408662f0853938b33fa440f27f04c1bbf5c

0377ff9d

Fix race condition causing double deletion of ssts · 1f5def16

由 Yanqin Jin 提交于 3月 28, 2018

Summary:
Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files, but this PR targets sst.
Also add a unit test to verify that this PR fixes the issue.

The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows
```
timeline user_thread background_compaction thread
t1 | FindObsoleteFiles(full_scan=false)
t2 | FindObsoleteFiles(full_scan=true)
t3 | PurgeObsoleteFiles
t4 | PurgeObsoleteFiles
V
```
When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`.
To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files.

ajkr could you take a look and comment? Thanks!
Closes https://github.com/facebook/rocksdb/pull/3638

Differential Revision: D7384372

Pulled By: riversand963

fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3

1f5def16

28 3月, 2018 2 次提交

Update comments about MergeOperator::AllowSingleOperand · 90c54234

由 Sagar Vemuri 提交于 3月 27, 2018

Summary:
Updated comments around AllowSingleOperand.
Reason: A couple of users were confused and encountered issues due to no overriding PartialMerge with AllowSingleOperand=true.

I'll also look into modifying the default merge operator implementation so that overriding PartialMerge is not mandatory when AllowSingleOp=true.
Closes https://github.com/facebook/rocksdb/pull/3659

Differential Revision: D7422691

Pulled By: sagar0

fbshipit-source-id: 3d075a6ced0120f5d65cb7ae5412936f1862f342

90c54234

Fix a leak in FilterBlockBuilder when adding prefix · d6876702

由 Sagar Vemuri 提交于 3月 27, 2018

Summary:
Our valgrind continuous test found an interesting leak which got introduced in #3614. We were adding the prefix key before saving the previous prefix start offset, due to which previous prefix offset is always incorrect. Fixed it by saving the the previous sate before adding the key.
Closes https://github.com/facebook/rocksdb/pull/3660

Differential Revision: D7418698

Pulled By: sagar0

fbshipit-source-id: 9933685f943cf2547ed5c553f490035a2fa785cf

d6876702

27 3月, 2018 4 次提交

Align SST file data blocks to avoid spanning multiple pages · f9f4d40f

由 Anand Ananthabhotla 提交于 3月 26, 2018

Summary:
Provide a block_align option in BlockBasedTableOptions to allow
alignment of SST file data blocks. This will avoid higher
IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages.
When this option is set to true, the block alignment is set to lower of
block size and 4KB.
Closes https://github.com/facebook/rocksdb/pull/3502

Differential Revision: D7400897

Pulled By: anand1976

fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44

f9f4d40f

WritePrepared Txn: Increase commit cache size to 2^23 · 0999e9b7

由 Maysam Yabandeh 提交于 3月 26, 2018

Summary:
Current commit cache size is 2^21. This was due to a type. With 2^23 commit entries we can have transactions as long as 64s without incurring the cost of having them evicted from the commit cache before their commit. Here is the math:
2^23 / 2 (one out of two seq numbers are for commit) / 2^16 TPS = 2^6 = 64s
Closes https://github.com/facebook/rocksdb/pull/3657

Differential Revision: D7411211

Pulled By: maysamyabandeh

fbshipit-source-id: e7cacf40579f3acf940643d8a1cfe5dd201caa35

0999e9b7

Fix race condition via concurrent FlushWAL · 35a4469b

由 Maysam Yabandeh 提交于 3月 26, 2018

Summary:
Currently log_writer->AddRecord in WriteImpl is protected from concurrent calls via FlushWAL only if two_write_queues_ option is set. The patch fixes the problem by i) skip log_writer->AddRecord in FlushWAL if manual_wal_flush is not set, ii) protects log_writer->AddRecord in WriteImpl via log_write_mutex_ if manual_wal_flush_ is set but two_write_queues_ is not.

Fixes #3599
Closes https://github.com/facebook/rocksdb/pull/3656

Differential Revision: D7405608

Pulled By: maysamyabandeh

fbshipit-source-id: d6cc265051c77ae49c7c6df4f427350baaf46934

35a4469b

Exclude MySQLStyleTransactionTest.TransactionStressTest* from valgrind · 23f9d93f

由 Sagar Vemuri 提交于 3月 26, 2018

Summary:
I found that each instance of MySQLStyleTransactionTest.TransactionStressTest/x is taking more than 10 hours to complete on our continuous testing environment, causing the whole valgrind run to timeout after a day. So excluding these tests.
Closes https://github.com/facebook/rocksdb/pull/3652

Differential Revision: D7400332

Pulled By: sagar0

fbshipit-source-id: 987810574506d01487adf7c2de84d4817ec3d22d

23f9d93f

24 3月, 2018 7 次提交

WritePrepared Txn: AddPrepared for all sub-batches · 3e417a66

由 Maysam Yabandeh 提交于 3月 23, 2018

Summary:
Currently AddPrepared is performed only on the first sub-batch if there are duplicate keys in the write batch. This could cause a problem if the transaction takes too long to commit and the seq number of the first sub-patch moved to old_prepared_ but not the seq of the later ones. The patch fixes this by calling AddPrepared for all sub-patches.
Closes https://github.com/facebook/rocksdb/pull/3651

Differential Revision: D7388635

Pulled By: maysamyabandeh

fbshipit-source-id: 0ccd80c150d9bc42fe955e49ddb9d7ca353067b4

3e417a66

Imporve perf of random read and insert compare by suggesting inlining to the compiler · d382ae7d

由 Dmitri Smirnov 提交于 3月 23, 2018

Summary:
Results from 2015 compiler. This improve sequential insert. Random Read results are inconclusive but I hope 2017 will do a better job at inlining.

Before:
fillseq      :       **3.638 micros/op 274866 ops/sec;  213.9 MB/s**

After:
fillseq      :       **3.379 micros/op 295979 ops/sec;  230.3 MB/s**
Closes https://github.com/facebook/rocksdb/pull/3645

Differential Revision: D7382711

Pulled By: siying

fbshipit-source-id: 092a07ffe8a6e598d1226ceff0f11b35e6c5c8e4

d382ae7d

Refactor sync_point to make implementation either customizable or replaceable · 53d66df0

由 Dmitri Smirnov 提交于 3月 23, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3637

Differential Revision: D7354373

Pulled By: ajkr

fbshipit-source-id: 6816c7bbc192ed0fb944942b11c7074bf24eddf1

53d66df0

Add 5.11 and 5.12 to tools/check_format_compatible.sh · a993c013

由 Sagar Vemuri 提交于 3月 23, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3646

Differential Revision: D7384727

Pulled By: sagar0

fbshipit-source-id: f713af7adb2ffea5303bbf0fac8a8a1630af7b38

a993c013

Avoid adding tombstones of the same file to RangeDelAggregator multiple times · e80709a3

由 daheiantian 提交于 3月 23, 2018

Summary:
RangeDelAggregator will remember the files whose range tombstones have been added,
so the caller can check whether the file has been added before call AddTombstones.

Closes https://github.com/facebook/rocksdb/pull/3635

Differential Revision: D7354604

Pulled By: ajkr

fbshipit-source-id: 9b9f7ec130556028df417e650711554b46d8d107

e80709a3

Add Java-API-Changes section to History · 7ffce280

由 Sagar Vemuri 提交于 3月 23, 2018

Summary:
We have not been updating our HISTORY.md change log with the RocksJava changes. Going forward, lets add Java changes also to HISTORY.md.
There is an old java/HISTORY-JAVA.md, but it hasn't been updated in years. It is much easier to remember to update the change log in a single file, HISTORY.md.

I added information about shared block cache here, which was introduced in #3623.
Closes https://github.com/facebook/rocksdb/pull/3647

Differential Revision: D7384448

Pulled By: sagar0

fbshipit-source-id: 9b6e569f44e6df5cb7ba06413d9975df0b517d20

7ffce280

InlineSkiplist: don't decode keys unnecessarily during comparisons · 09b6bf82

由 Radoslaw Zarzynski 提交于 3月 23, 2018

Summary:
Summary
========
`InlineSkipList<>::Insert` takes the `key` parameter as a C-string. Then, it performs multiple comparisons with it requiring the `GetLengthPrefixedSlice()` to be spawn in `MemTable::KeyComparator::operator()(const char* prefix_len_key1, const char* prefix_len_key2)` on the same data over and over. The patch tries to optimize that.

Rough performance comparison
=====
Big keys, no compression.

```
$ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256
(...)
fillrandom   :       4.222 micros/op 236836 ops/sec;   80.4 MB/s
```

```
$ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256
(...)
fillrandom   :       4.064 micros/op 246059 ops/sec;   83.5 MB/s
```

TODO
======
In ~~a separated~~ this PR:
- [x] Go outside the write path. Maybe even eradicate the C-string-taking variant of `KeyIsAfterNode` entirely.
- [x] Try to cache the transformations applied by `KeyComparator` & friends in situations where we havy many comparisons with the same key.
Closes https://github.com/facebook/rocksdb/pull/3516

Differential Revision: D7059300

Pulled By: ajkr

fbshipit-source-id: 6f027dbb619a488129f79f79b5f7dbe566fb2dbb

09b6bf82

23 3月, 2018 6 次提交

FlushReason improvement · 1cbc96d2

由 Zhongyi Xie 提交于 3月 22, 2018

Summary:
Right now flush reason "SuperVersion Change" covers a few different scenarios which is a bit vague. For example, the following db_bench job should trigger "Write Buffer Full"

> $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304
$ grep 'flush_reason' /dev/shm/dbbench/LOG
...
2018/03/06-17:30:42.543638 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242543634, "job": 192, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018024, "flush_reason": "SuperVersion Change"}
2018/03/06-17:30:42.569541 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242569536, "job": 193, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}
2018/03/06-17:30:42.596396 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242596392, "job": 194, "event": "flush_started", "num_memtables": 1, "num_entries": 7008, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "SuperVersion Change"}
2018/03/06-17:30:42.622444 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242622440, "job": 195, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"}

With the fix:
> 2018/03/19-14:40:02.341451 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602341444, "job": 98, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018008, "flush_reason": "Write Buffer Full"}
2018/03/19-14:40:02.379655 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602379642, "job": 100, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018016, "flush_reason": "Write Buffer Full"}
2018/03/19-14:40:02.418479 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602418474, "job": 101, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
2018/03/19-14:40:02.455084 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602455079, "job": 102, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "Write Buffer Full"}
2018/03/19-14:40:02.492293 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602492288, "job": 104, "event": "flush_started", "num_memtables": 1, "num_entries": 7007, "num_deletes": 0, "memory_usage": 1018056, "flush_reason": "Write Buffer Full"}
2018/03/19-14:40:02.528720 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602528715, "job": 105, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"}
2018/03/19-14:40:02.566255 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602566238, "job": 107, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018112, "flush_reason": "Write Buffer Full"}
Closes https://github.com/facebook/rocksdb/pull/3627

Differential Revision: D7328772

Pulled By: miasantreble

fbshipit-source-id: 67c94065fbdd36930f09930aad0aaa6d2c152bb8

1cbc96d2

Add unit test for WAL corruption · 82137f0c

由 Andrew Kryczka 提交于 3月 22, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3618

Differential Revision: D7301053

Pulled By: ajkr

fbshipit-source-id: a9dde90caa548c294d03d6386f78428c8536ca14

82137f0c

Fsync after writing global seq number in ExternalSstFileIngestionJob · 2e3d4077

由 Sagar Vemuri 提交于 3月 22, 2018

Summary:
Fsync after writing global sequence number to the ingestion file in ExternalSstFileIngestionJob. Otherwise the file metadata could be incorrect.
Closes https://github.com/facebook/rocksdb/pull/3644

Differential Revision: D7373813

Pulled By: sagar0

fbshipit-source-id: 4da2c9e71a8beb5c08b4ac955f288ee1576358b8

2e3d4077

Rename function for handling WAL write error · 4d51feab

由 Andrew Kryczka 提交于 3月 22, 2018

Summary:
It was misnamed. It actually updates `bg_error_` if `PreprocessWrite()` or `WriteToWAL()` fail, not related to the user callback.
Closes https://github.com/facebook/rocksdb/pull/3485

Differential Revision: D6955787

Pulled By: ajkr

fbshipit-source-id: bd7afc3fdb7a52830c021cbfc25fcbc3ab7d5e10

4d51feab

SstFileManager: add bytes_max_delete_chunk · 118058ba

由 Siying Dong 提交于 3月 22, 2018

Summary:
Add `bytes_max_delete_chunk` in SstFileManager so that we can drop a large file in multiple batches.
Closes https://github.com/facebook/rocksdb/pull/3640

Differential Revision: D7358679

Pulled By: siying

fbshipit-source-id: ef17f0da2f5723dbece2669485a9b91b3edc0bb7

118058ba

log value of CompressionOptions::zstd_max_train_bytes · 88c3e26c

由 Andrew Kryczka 提交于 3月 22, 2018

Summary: Closes https://github.com/facebook/rocksdb/pull/3587

Differential Revision: D7206901

Pulled By: ajkr

fbshipit-source-id: 5d4b1a2653627b44aa3c22db7d98c9cd5dcdb67a

88c3e26c

kvdb / rocksdb 10 个月 前同步成功

kvdb / rocksdb
10 个月前同步成功