提交 · 4d40b10e0f111319a45a53dc074df635a5a76853 · kvdb / rocksdb

05 5月, 2018 2 次提交

Add USE_RTTI and default behavior to CMakeLists · 4d40b10e

由 Fosco Marotto 提交于 5月 04, 2018

Summary:
Proposed fix for #3701
Closes https://github.com/facebook/rocksdb/pull/3801

Differential Revision: D7868264

Pulled By: gfosco

fbshipit-source-id: 013963ed3d172c8dc2abd1dd5982580082ca5d2d

4d40b10e

Fix crash test allocation error under TSAN · 6fc1bcce

由 Andrew Kryczka 提交于 5月 04, 2018

Summary:
We were seeing the following error: "ThreadSanitizer: DenseSlabAllocator overflow. Dying."

It is fixable by mmap'ing a smaller region for keys' expected values, which this PR achieves by reducing the number of keys.
Closes https://github.com/facebook/rocksdb/pull/3803

Differential Revision: D7874478

Pulled By: ajkr

fbshipit-source-id: 433939f5cb92410ab4777d540cb0cc2ee0fe6c2e

6fc1bcce

04 5月, 2018 4 次提交

MaxFileSizeForLevel: adjust max_file_size for dynamic level compaction · a7034328

由 Zhongyi Xie 提交于 5月 03, 2018

Summary:
`MutableCFOptions::RefreshDerivedOptions` always assume base level is L1, which is not true when `level_compaction_dynamic_level_bytes=true` and Level based compaction is used.
This PR fixes this by recomputing `max_file_size` at query time (in `MaxFileSizeForLevel`)
Fixes https://github.com/facebook/rocksdb/issues/3229

In master:

```
Level Files Size(MB)
--------------------
  0       14      846
  1        0        0
  2        0        0
  3        0        0
  4        0        0
  5       15      366
  6       11      481
Cumulative compaction: 3.83 GB write, 2.27 GB read
```
In branch:
```
Level Files Size(MB)
--------------------
  0        9      544
  1        0        0
  2        0        0
  3        0        0
  4        0        0
  5        0        0
  6      445      935
Cumulative compaction: 2.91 GB write, 1.46 GB read
```

db_bench command used:
```
./db_bench --benchmarks="fillrandom,deleterandom,fillrandom,levelstats,stats" --statistics -deletes=5000 -db=tmp -compression_type=none --num=20000 -value_size=100000 -level_compaction_dynamic_level_bytes=true -target_file_size_base=2097152 -target_file_size_multiplier=2
```
Closes https://github.com/facebook/rocksdb/pull/3755

Differential Revision: D7721381

Pulled By: miasantreble

fbshipit-source-id: 39afb8503190bac3b466adf9bbf2a9b3655789f8

a7034328

Better destroydb · 934f96de

由 Dmitri Smirnov 提交于 5月 03, 2018

Summary:
Delete archive directory before WAL folder
  since archive may be contained as a subfolder.
  Also improve loop readability.
Closes https://github.com/facebook/rocksdb/pull/3797

Differential Revision: D7866378

Pulled By: riversand963

fbshipit-source-id: 0c45d97677ce6fbefa3f8d602ef5e2a2a925e6f5

934f96de

Speedup ManualCompactionTest.Test · a8d77ca3

由 Maysam Yabandeh 提交于 5月 03, 2018

Summary:
ManualCompactionTest.Test occasionally times out in tsan flavor of our test infra. The patch reduces the number of keys to make the test run faster. The change does not seem to negatively impact the coverage of the test.
Closes https://github.com/facebook/rocksdb/pull/3802

Differential Revision: D7865596

Pulled By: maysamyabandeh

fbshipit-source-id: b4f60e32c3ae1677e25506f71c766e33fa985785

a8d77ca3

Skip deleted WALs during recovery · d5954929

由 Siying Dong 提交于 5月 03, 2018

Summary:
This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.

Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)

This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
Closes https://github.com/facebook/rocksdb/pull/3765

Differential Revision: D7747618

Pulled By: siying

fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729

d5954929

03 5月, 2018 2 次提交

WritePrepared Txn: enable rollback in stress test · cfb86659

由 Maysam Yabandeh 提交于 5月 02, 2018

Summary:
Rollback was disabled in stress test since there was a concurrency issue in WritePrepared rollback algorithm. The issue is fixed by caching the column family handles in WritePrepared to skip getting them from the db when needed for rollback.

Tested by running transaction stress test under tsan.
Closes https://github.com/facebook/rocksdb/pull/3785

Differential Revision: D7793727

Pulled By: maysamyabandeh

fbshipit-source-id: d81ab6fda0e53186ca69944cfe0712ce4869451e

cfb86659

WritePrepared Txn: split SeqAdvanceConcurrentTest · 5bed8a00

由 Maysam Yabandeh 提交于 5月 02, 2018

Summary:
The tsan flavor of SeqAdvanceConcurrentTest times out in our test infra. The patch splits it into 10 tests.
On my vm before:
[       OK ] WritePreparedTransactionTest/WritePreparedTransactionTest.SeqAdvanceConcurrentTest/0 (5194 ms)
after:
[       OK ] OneWriteQueue/SeqAdvanceConcurrentTest.SeqAdvanceConcurrentTest/0 (1906 ms)
Closes https://github.com/facebook/rocksdb/pull/3799

Differential Revision: D7854515

Pulled By: maysamyabandeh

fbshipit-source-id: 4fbac42a1f974326cbc237f8cb9d6232d379c431

5bed8a00

02 5月, 2018 3 次提交

avoid double delete on dummy record insertion failure · 6cab3184

由 Zhongyi Xie 提交于 5月 01, 2018

Summary:
When the dummy record insertion fails, there is no need to explicitly delete the block as it will be registered for cleanup regardless.
Closes https://github.com/facebook/rocksdb/pull/3688

Differential Revision: D7537741

Pulled By: miasantreble

fbshipit-source-id: fcd3a3d3d382ee8e2c7ced0a4980e683d93a16d6

6cab3184

Adjust pread/pwrite to return Status · acb61b7a

由 Dmitri Smirnov 提交于 5月 01, 2018

Summary:
Returning bytes_read causes the caller to call GetLastError()
  to report failure but the lasterror may be overwritten by then
  so we lose the error code.
  Fix up CMake file to include xpress source code only when needed.
  Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795

Differential Revision: D7832935

Pulled By: anand1976

fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b

acb61b7a

initialize local variable for UBSAN in PosixEnv function · 19fde548

由 Andrew Kryczka 提交于 5月 01, 2018

Summary:
this is a repeat commit of a8a28da2, which got reverted together with 6afe22db, but forgotten about when that commit was un-reverted in 46152d53.
Closes https://github.com/facebook/rocksdb/pull/3796

Differential Revision: D7826077

Pulled By: ajkr

fbshipit-source-id: edb22375da56e2feda50c5b35f942f4d2d52b19c

19fde548

01 5月, 2018 2 次提交

Second attempt at db_stress crash-recovery verification · 46152d53

由 Andrew Kryczka 提交于 4月 30, 2018

Summary:
- Original commit: a4fb1f8c
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db

This PR includes the contents of the original commit plus two bug fixes, which are:

- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793

Differential Revision: D7811671

Pulled By: ajkr

fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30

46152d53

fix missing perfcontext destroy declare in C API · 282099fc

由 Vincent Lee 提交于 4月 30, 2018

Summary:
`rocksdb_perfcontext_destroy` declare is missing in C API.
Closes https://github.com/facebook/rocksdb/pull/3787

Differential Revision: D7816490

Pulled By: ajkr

fbshipit-source-id: 3a488607bfc897c7ce846a1b3c2b7af693134d0d

282099fc

28 4月, 2018 5 次提交

expose WAL iterator in the C API · c9ace1d8

由 Victor Grishchenko 提交于 4月 27, 2018

Summary:
A minor change: I wrapped TransactionLogIterator for the C API.
I needed that for the golang binding.
Closes https://github.com/facebook/rocksdb/pull/3304

Differential Revision: D6628736

Pulled By: miasantreble

fbshipit-source-id: 3374f3c64b1d7b225696b8767090917761e2f30a

c9ace1d8

revert db_stress crash-recovery verification · 6afe22db

由 Andrew Kryczka 提交于 4月 27, 2018

Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786

Differential Revision: D7794516

Pulled By: ajkr

fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33

6afe22db

remove prefixscanrandom from db_bench help · 459bb902

由 Zhongyi Xie 提交于 4月 27, 2018

Summary:
fix issue reported in https://github.com/facebook/rocksdb/issues/3757
Closes https://github.com/facebook/rocksdb/pull/3784

Differential Revision: D7794107

Pulled By: miasantreble

fbshipit-source-id: 43535074fcb82adb5656bcb916284b2dfc5cbb64

459bb902

Add max_subcompactions as a compaction option · ed7a95b2

由 Huachao Huang 提交于 4月 27, 2018

Summary:
Sometimes we want to compact files as fast as possible, but don't want to set a large `max_subcompactions` in the `DBOptions` by default.
I add a `max_subcompactions` options to `CompactionOptions` so that we can choose a proper concurrency dynamically.
Closes https://github.com/facebook/rocksdb/pull/3775

Differential Revision: D7792357

Pulled By: ajkr

fbshipit-source-id: 94f54c3784dce69e40a229721a79a97e80cd6a6c

ed7a95b2

Rename pending_compaction_ to queued_for_compaction_. · 7dfbe335

由 Yanqin Jin 提交于 4月 27, 2018

Summary:
We use `queued_for_flush_` to indicate a column family has been added to the
flush queue. Similarly and to be consistent in our naming, we need to use `queued_for_compaction_` to indicate a column family has been added to the compaction queue. In the past we used
`pending_compaction_` which can also be ambiguous.
Closes https://github.com/facebook/rocksdb/pull/3781

Differential Revision: D7790063

Pulled By: riversand963

fbshipit-source-id: 6786b11a4fcaea36dc9b4672233dbe042f921804

7dfbe335

27 4月, 2018 7 次提交

Rename pending_flush_ to queued_for_flush_. · 513b5ce6

由 Yanqin Jin 提交于 4月 26, 2018

Summary:
With ColumnFamilyData::pending_flush_, we have the following code snippet in DBImpl::ScheedulePendingFlush

```
if (!cfd->pending_flush() && cfd->imm()->IsFlushPending()) {
...
}
```

`Pending` is ambiguous, and I feel `queued_for_flush` is a better name,
especially for the sake of readability.
Closes https://github.com/facebook/rocksdb/pull/3777

Differential Revision: D7783066

Pulled By: riversand963

fbshipit-source-id: f1bd8c8bfe5eafd2c94da0d8566c9b2b6bb57229

513b5ce6

Add virtual Truncate method to Env · 37cd617b

由 Nathan VanBenschoten 提交于 4月 26, 2018

Summary:
This change adds a virtual `Truncate` method to `Env`, which truncates
the named file to the specified size. At the moment, this is only
supported for `MockEnv`, but other `Env's` could be extended to override
the method too. This is the same approach that methods like `LinkFile` and
`AreSameFile` have taken.

This is useful for any user of the in-memory `Env`. The implementation's
header is not exported, so before this change, it was impossible to
access it's already existing `Truncate` method.
Closes https://github.com/facebook/rocksdb/pull/3779

Differential Revision: D7785789

Pulled By: ajkr

fbshipit-source-id: 3bcdaeea7b7180529f7d9b496dc67b791a00bbf0

37cd617b

Allow options file in db_stress and db_crashtest · db36f222

由 Andrew Kryczka 提交于 4月 26, 2018

Summary:
- When options file is provided to db_stress, take supported options from the file instead of from flags
- Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file
- Allow options filename to be passed via `db_crashtest.py`
Closes https://github.com/facebook/rocksdb/pull/3768

Differential Revision: D7755331

Pulled By: ajkr

fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a

db36f222

Remove block-based table assertion for non-empty filter block · 7004e454

由 Andrew Kryczka 提交于 4月 26, 2018

Summary:
7a6353bd prevents empty filter blocks from being written for SST files containing range deletions only. However the assertion this PR removes is still a problem as we could be reading from a DB generated by a RocksDB build without the 7a6353bd patch. So remove the assertion. We already don't do this check when `cache_index_and_filter_blocks=false`, so it should be safe.
Closes https://github.com/facebook/rocksdb/pull/3773

Differential Revision: D7769964

Pulled By: ajkr

fbshipit-source-id: 7285762446f2cd2ccf16efd7a988a106fbb0d8d3

7004e454

Sync parent directory after deleting a file in delete scheduler · 63c965cd

由 Siying Dong 提交于 4月 26, 2018

Summary:
sync parent directory after deleting a file in delete scheduler. Otherwise, trim speed may not be as smooth as what we want.
Closes https://github.com/facebook/rocksdb/pull/3767

Differential Revision: D7760136

Pulled By: siying

fbshipit-source-id: ec131d53b61953f09c60d67e901e5eeb2716b05f

63c965cd

Fix the bloom filter skipping empty prefixes · 7e4e3814

由 Maysam Yabandeh 提交于 4月 26, 2018

Summary:
bc0da4b5 optimized bloom filters by skipping duplicate entires when the whole key and prefixes are both added to the bloom. It however used empty string as the initial value of the last entry added to the bloom. This is incorrect since empty key/prefix are valid entires by themselves. This patch fixes that.
Closes https://github.com/facebook/rocksdb/pull/3776

Differential Revision: D7778803

Pulled By: maysamyabandeh

fbshipit-source-id: d5a065daebee17f9403cac51e9d5626aac87bfbc

7e4e3814

WritePrepared Txn: disable rollback in stress test · e5a4dacf

由 Maysam Yabandeh 提交于 4月 26, 2018

Summary:
WritePrepared rollback implementation is not ready to be invoked in the middle of workload. This is due the lack of synchronization to obtain the cf handle from db. Temporarily disabling this until the problem with rollback is fixed.
Closes https://github.com/facebook/rocksdb/pull/3772

Differential Revision: D7769041

Pulled By: maysamyabandeh

fbshipit-source-id: 0e3b0ce679bc2afba82e653a40afa3f045722754

e5a4dacf

26 4月, 2018 5 次提交

Rate limiter should be allowed to share between different rocksdb instances in C API · 7c9f23e6

由 Vincent Lee 提交于 4月 25, 2018

Summary:
Currently, the `rocksdb_options_set_ratelimiter` in  `c.cc` will change the input to nil, which make it is
 not possible to use the shared rate limiter create by `rocksdb_ratelimiter_create` in different rocksdb option.

In this pr, I changed it to shared ptr.
Closes https://github.com/facebook/rocksdb/pull/3758

Differential Revision: D7749740

Pulled By: ajkr

fbshipit-source-id: c6121f8ca75402afdb4b295ce63c2338d253a1b5

7c9f23e6

Fix clang build failure with -Wgnu-redeclared-enum · 406b9519

由 Anand Ananthabhotla 提交于 4月 25, 2018

Summary:
In include/rocksdb/db.h, enum EntryType is redeclared even though
original declaration in types.h in included.
Closes https://github.com/facebook/rocksdb/pull/3766

Differential Revision: D7765504

Pulled By: anand1976

fbshipit-source-id: 622a8ecb306993915be1b9dd5cdd79dbc6a4ea05

406b9519

cmake: add options for enabling TBB and NUMA support · 13a0bd90

由 Kefu Chai 提交于 4月 25, 2018

Summary:
see also https://github.com/facebook/rocksdb/issues/3036Signed-off-by: NKefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/3750

Differential Revision: D7765170

Pulled By: ajkr

fbshipit-source-id: 455788b3131bf62a4987a65684b757e68473eed9

13a0bd90

initialize local variable for UBSAN in PosixEnv function · dfc61e7c

由 Andrew Kryczka 提交于 4月 25, 2018

Summary:
It seems clear to me that the variable is initialized before line 492, but it wasn't clear to UBSAN. The failure was:

```
In file included from ./env/io_posix.h:14:0,
                 from env/env_posix.cc:44:
./include/rocksdb/env.h: In member function ‘virtual rocksdb::Status rocksdb::{anonymous}::PosixEnv::NewMemoryMappedFileBuffer(const string&, std::unique_ptr<rocksdb::MemoryMappedFileBuffer>*)’:
./include/rocksdb/env.h:822:36: error: ‘base’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
       : base(_base), length(_length) {}
                                    ^
env/env_posix.cc:482:11: note: ‘base’ was declared here
     void* base;
```

We can just initialize to nullptr to keep UBSAN happy.
Closes https://github.com/facebook/rocksdb/pull/3770

Differential Revision: D7756287

Pulled By: ajkr

fbshipit-source-id: 0f2efb9594e2d3a30706a4ca7e1d4a6328031bf2

dfc61e7c

Pass -latomic to linker when using clang · 9b89479e

由 Andrew Kryczka 提交于 4月 25, 2018

Summary:
clang compilation is failing due to a4fb1f8c. In that commit I added a call to `std::atomic::is_lock_free` which was evidently relying on a compiler builtin only present in gcc.

Drawbacks to this fix are:

- users may need to install libatomic
- there might be cases where clang is used even though USE_CLANG is unset (e.g., when clang is the only available compiler). I didn't figure out how to add -latomic in those cases...

An alternative fix mentioned in http://lists.llvm.org/pipermail/llvm-bugs/2017-August/057263.html is using -stdlib=libc++ with clang.
Closes https://github.com/facebook/rocksdb/pull/3769

Differential Revision: D7756261

Pulled By: ajkr

fbshipit-source-id: 26888300683fa9970ab5950239d1aa217e8efd49

9b89479e

25 4月, 2018 2 次提交

Add crash-recovery correctness check to db_stress · a4fb1f8c

由 Andrew Kryczka 提交于 4月 24, 2018

Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b

a4fb1f8c

Skip duplicate bloom keys when whole_key and prefix are mixed · bc0da4b5

由 Maysam Yabandeh 提交于 4月 24, 2018

Summary:
Currently we rely on FilterBitsBuilder to skip the duplicate keys. It does that by comparing that hash of the key to the hash of the last added entry. This logic breaks however when we have whole_key_filtering mixed with prefix blooms as their addition to FilterBitsBuilder will be interleaved. The patch fixes that by comparing the last whole key and last prefix with the whole key and prefix of the new key respectively and skip the call to FilterBitsBuilder if it is a duplicate.
Closes https://github.com/facebook/rocksdb/pull/3764

Differential Revision: D7744413

Pulled By: maysamyabandeh

fbshipit-source-id: 15df73bbbafdfd754d4e1f42ea07f47b03bc5eb8

bc0da4b5

24 4月, 2018 3 次提交

Support lowering CPU priority of background threads · 090c78a0

由 Gabriel Wicke 提交于 4月 24, 2018

Summary:
Background activities like compaction can negatively affect
latency of higher-priority tasks like request processing. To avoid this,
rocksdb already lowers the IO priority of background threads on Linux
systems. While this takes care of typical IO-bound systems, it does not
help much when CPU (temporarily) becomes the bottleneck. This is
especially likely when using more expensive compression settings.

This patch adds an API to allow for lowering the CPU priority of
background threads, modeled on the IO priority API. Benchmarks (see
below) show significant latency and throughput improvements when CPU
bound. As a result, workloads with some CPU usage bursts should benefit
from lower latencies at a given utilization, or should be able to push
utilization higher at a given request latency target.

A useful side effect is that compaction CPU usage is now easily visible
in common tools, allowing for an easier estimation of the contribution
of compaction vs. request processing threads.

As with IO priority, the implementation is limited to Linux, degrading
to a no-op on other systems.
Closes https://github.com/facebook/rocksdb/pull/3763

Differential Revision: D7740096

Pulled By: gwicke

fbshipit-source-id: e5d32373e8dc403a7b0c2227023f9ce4f22b413c

090c78a0

Improve write time breakdown stats · affe01b0

由 Mike Kolupaev 提交于 4月 23, 2018

Summary:
There's a group of stats in PerfContext for profiling the write path. They break down the write time into WAL write, memtable insert, throttling, and everything else. We use these stats a lot for figuring out the cause of slow writes.

These stats got a bit out of date and are now categorizing some interesting things as "everything else", and also do some double counting. This PR fixes it and adds two new stats: time spent waiting for other threads of the batch group, and time spent waiting for scheduling flushes/compactions. Probably these will be enough to explain all the occasional abnormally slow (multiple seconds) writes that we're seeing.
Closes https://github.com/facebook/rocksdb/pull/3602

Differential Revision: D7251562

Pulled By: al13n321

fbshipit-source-id: 0a2d0f5a4fa5677455e1f566da931cb46efe2a0d

affe01b0

Revert "Skip deleted WALs during recovery" · d5afa737

由 Siying Dong 提交于 4月 23, 2018

Summary:
This reverts commit 73f21a7b.

It breaks compatibility. When created a DB using a build with this new change, opening the DB and reading the data will fail with this error:

"Corruption: Can't access /000000.sst: IO error: while stat a file for size: /tmp/xxxx/000000.sst: No such file or directory"

This is because the dummy AddFile4 entry generated by the new code will be treated as a real entry by an older build. The older build will think there is a real file with number 0, but there isn't such a file.
Closes https://github.com/facebook/rocksdb/pull/3762

Differential Revision: D7730035

Pulled By: siying

fbshipit-source-id: f2051859eff20ef1837575ecb1e1bb96b3751e77

d5afa737

21 4月, 2018 5 次提交

Avoid directory renames in BackupEngine · a8a28da2

由 Andrew Kryczka 提交于 4月 20, 2018

Summary:
We used to name private directories like "1.tmp" while BackupEngine populated them, and then rename without the ".tmp" suffix (i.e., rename "1.tmp" to "1") after all files were copied. On glusterfs, directory renames like this require operations across many hosts, and partial failures have caused operational problems.

Fortunately we don't need to rename private directories. We already have a meta-file that uses the tempfile-rename pattern to commit a backup atomically after all its files have been successfully copied. So we can copy private files directly to their final location, so now there's no directory rename.
Closes https://github.com/facebook/rocksdb/pull/3749

Differential Revision: D7705610

Pulled By: ajkr

fbshipit-source-id: fd724a28dd2bf993ce323a5f2cb7e7d6980cc346

a8a28da2

Disable EnvPosixTest::FilePermission · 2e72a589

由 Yi Wu 提交于 4月 20, 2018

Summary:
The test is flaky in our CI but could not be reproduce manually on the same CI host. Disabling it.
Closes https://github.com/facebook/rocksdb/pull/3753

Differential Revision: D7716320

Pulled By: yiwu-arbug

fbshipit-source-id: 6bed3b05880c1d24e8dc86bc970e5181bc98fb45

2e72a589

WritePrepared Txn: rollback via commit · bb2a2ec7

由 Maysam Yabandeh 提交于 4月 20, 2018

Summary:
Currently WritePrepared rolls back a transaction with prepare sequence number prepare_seq by i) write a single rollback batch with rollback_seq, ii) add <rollback_seq, rollback_seq> to commit cache, iii) remove prepare_seq from PrepareHeap.
This is correct assuming that there is no snapshot taken when a transaction is rolled back. This is the case the way MySQL does rollback which is after recovery. Otherwise if max_evicted_seq advances the prepare_seq, the live snapshot might assume data as committed since it does not find them in CommitCache.
The change is to simply add <prepare_seq. rollback_seq> to commit cache before removing prepare_seq from PrepareHeap. In this way if max_evicted_seq advances prpeare_seq, the existing mechanism that we have to check evicted entries against live snapshots will make sure that the live snapshot will not see the data of rolled back transaction.
Closes https://github.com/facebook/rocksdb/pull/3745

Differential Revision: D7696193

Pulled By: maysamyabandeh

fbshipit-source-id: c9a2d46341ddc03554dded1303520a1cab74ef9c

bb2a2ec7

Add a stat for MultiGet keys found, update memtable hit/miss stats · dbdaa466

由 Anand Ananthabhotla 提交于 4月 20, 2018

Summary:
1. Add a new ticker stat rocksdb.number.multiget.keys.found to track the
number of keys successfully read
2. Update rocksdb.memtable.hit/miss in DBImpl::MultiGet(). It was being done in
DBImpl::GetImpl(), but not MultiGet
Closes https://github.com/facebook/rocksdb/pull/3730

Differential Revision: D7677364

Pulled By: anand1976

fbshipit-source-id: af22bd0ef8ddc5cf2b4244b0a024e539fe48bca5

dbdaa466

WritePrepared Txn: enable TryAgain for duplicates at the end of the batch · c3d1e36c

由 Maysam Yabandeh 提交于 4月 20, 2018

Summary:
The WriteBatch::Iterate will try with a larger sequence number if the memtable reports a duplicate. This status is specified with TryAgain status. So far the assumption was that the last entry in the batch will never return TryAgain, which is correct when WAL is created via WritePrepared since it always appends a batch separator if a natural one does not exist. However when reading a WAL generated by WriteCommitted this batch separator might not exist. Although WritePrepared is not supposed to be able to read the WAL generated by WriteCommitted we should avoid confusing scenarios in which the behavior becomes unpredictable. The path fixes that by allowing TryAgain even for the last entry of the write batch.
Closes https://github.com/facebook/rocksdb/pull/3747

Differential Revision: D7708391

Pulled By: maysamyabandeh

fbshipit-source-id: bfaddaa9b14a4cdaff6977f6f63c789a6ab1ee0d

c3d1e36c

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功