提交 · 8f143e03fb8cf1663349798cae7b9d2f182e4573 · kvdb / rocksdb

14 10月, 2015 2 次提交

Make db_test_util compile under ROCKSDB_LITE · f55d3009

由 Islam AbdelRahman 提交于 10月 13, 2015

Summary: db_test_util is used in multiple test files but it dont compile under ROCKSDB_LITE

Test Plan:
make check
make static_lib
OPT=-DROCKSDB_LITE make db_wal_test

Reviewers: igor, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48579

f55d3009

Seperate InternalIterator from Iterator · 35ad531b

由 sdong 提交于 10月 12, 2015

Summary:
Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.

This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.

Test Plan: Run all existing tests.

Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48549

35ad531b

13 10月, 2015 1 次提交

Move TEST_NewInternalIterator to NewInternalIterator · c64ae05b

由 Islam AbdelRahman 提交于 10月 12, 2015

Summary:
Long time ago we add InternalDumpCommand to ldb_tool https://reviews.facebook.net/D11517
This command is using TEST_NewInternalIterator although it's not a test. This patch move TEST_NewInternalIterator outside of db_impl_debug.cc

Test Plan:
make check
make static_lib

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48561

c64ae05b

10 10月, 2015 2 次提交

Passing table properties to compaction callback · 3d07b815

由 Alexey Maykov 提交于 9月 15, 2015

Summary: It would be nice to have and access to table properties in compaction callbacks. In MyRocks project, it will make possible to update optimizer statistics online.

Test Plan: ran the unit test. Ran myrocks with the new way of collecting stats.

Reviewers: igor, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48267

3d07b815

Pass column family ID to table property collector · 776bd8d5

由 sdong 提交于 10月 08, 2015

Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.

Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48411

776bd8d5

07 10月, 2015 1 次提交

Support for LevelDB SST with .ldb suffix · 02675026

由 dyniusz 提交于 10月 06, 2015

Summary:
	Handle SST files with both ".sst" and ".ldb" suffix.
	This enables user to migrate from leveldb to rocksdb.

Test Plan:
        Added unit test with DB operating on SSTs with names schema.
        See db/dc_test.cc:SSTsWithLdbSuffixHandling for details

Reviewers: yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48003

02675026

03 10月, 2015 1 次提交

Add APIs PauseBackgroundWork() and ContinueBackgroundWork() · 115427ef

由 Igor Canadi 提交于 10月 02, 2015

Summary:
To support a new MongoDB capability, we need to make sure that we don't do any IO for a short period of time. For background, see:
* https://jira.mongodb.org/browse/SERVER-20704
* https://jira.mongodb.org/browse/SERVER-18899

To implement that, I add a new API calls PauseBackgroundWork() and ContinueBackgroundWork() which reuse the capability we already have in place for RefitLevel() function.

Test Plan: Added a new test in db_test. Made sure that test fails when PauseBackgroundWork() is commented out.

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47901

115427ef

24 9月, 2015 1 次提交

Add experimental DB::AddFile() to plug sst files into empty DB · f03b5c98

由 Islam AbdelRahman 提交于 9月 23, 2015

Summary:
This is an initial version of bulk load feature

This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are
(1) Memtables are empty
(2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0
(3) Added sst files values are not overlapping

Test Plan: unit testing

Reviewers: igor, ott, sdong

Reviewed By: sdong

Subscribers: leveldb, ott, dhruba

Differential Revision: https://reviews.facebook.net/D39081

f03b5c98

23 9月, 2015 1 次提交

Internal stats WAL file synced to match meaning of the stats of the same name · d0c31641

由 sdong 提交于 9月 21, 2015

Summary: https://reviews.facebook.net/D23343 changed WAL sync bytes to extra fsync. This change does the same for internal stats.

Test Plan: Run all existing unit tests and verify results in db_bench.

Reviewers: anthony, rven, igor, MarkCallaghan, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D47349

d0c31641

18 9月, 2015 2 次提交

Support for SingleDelete() · 014fd55a

由 Andres Noetzli 提交于 9月 17, 2015

Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179

014fd55a

Do not flag error if file to be deleted does not exist · 51e1c112

由 Venkatesh Radhakrishnan 提交于 9月 17, 2015

Summary:
Some users have observed errors in the log file when
the log file or sst file is already deleted.

Test Plan:
Make sure that the errors do not appear for already deleted
files.

Reviewers: sdong

Reviewed By: sdong

Subscribers: anthony, kradhakrishnan, yhchiang, rven, igor, IslamAbdelRahman, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47115

51e1c112

17 9月, 2015 3 次提交

DB::Open() to flush info log after printing DB pointer · 9aca7cd6

由 sdong 提交于 9月 16, 2015

Summary: Now DB::Open() flushes info log before printing DB pointer, so it may not show up if no activity after DB open. Move log flushing from after printing options to printing DB pointer.

Test Plan: make commit-prereq

Reviewers: igor, IslamAbdelRahman, yhchiang, kradhakrishnan, anthony, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D47121

9aca7cd6

Change the log level of DB start-up log from Warn to Header. · f21c7415

由 Yueh-Hsuan Chiang 提交于 9月 16, 2015

Summary: Change the log level of DB start-up log from Warn to Header.

Test Plan: db_bench and observe the LOG header

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47067

f21c7415

Adding the increment for a counter for a number of WAL syncs · 3ebf11ed

由 Alexey Maykov 提交于 9月 14, 2015

Summary: This will unblock the corresponding change in MyRocks

Test Plan: ran rocksdb.write_sync test

Reviewers: sdong, kolmike

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46911

3ebf11ed

15 9月, 2015 1 次提交

DBImpl::FindObsoleteFiles() shouldn't release mutex between getting... · f3170b6f

由 sdong 提交于 9月 14, 2015

DBImpl::FindObsoleteFiles() shouldn't release mutex between getting min_pending_output and scanning files

Summary:
Releasing mutex between getting min_pending_output and scanning files may cause min_pending_output to be max but some non-final files are found in file scanning, ending up with deleting wrong files.
As a recent regression, mutex can be released while waiting for log sync. We move it to after file scanning.

Test Plan: Run all existing tests. Don't think it is easy to write a unit test. Maybe we should find a way to assert lock not released so that we can have some test verification for similar cases.

Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, kolmike, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D46899

f3170b6f

11 9月, 2015 2 次提交

Relaxing consistency detection to include errors while inserting to memtable as WAL recovery error. · 11266440

由 krad 提交于 9月 09, 2015

Summary: The current code, considers data to be consistent if the record
checksum passes. We do have customer issues where the record checksum passed but
the data was incomprehensible. There is no way to get out of this error case
since all WAL recovery model will consider this error as unrelated to WAL.

Relaxing the definition and including errors while inserting to memtable as WAL
errors and handing them as per the recovery level.

Test Plan: Used customer dump to verify the fix for different level. The db
opens for kSkipAnyCorruptedRecords and kPointInTimeRecovery, but fails for
kAbsoluteConsistency and kTolerateCorruptedTailRecords.

Reviewers: sdon igor

CC: leveldb@

Task ID: #7918721

Blame Rev:

11266440

Set max_open_files based on ulimit · ac9bcb55

由 Igor Canadi 提交于 9月 10, 2015

Summary: We should never set max_open_files to be bigger than the system's ulimit. Otherwise we will get "Too many open files" errors. See an example in this Travis run: https://travis-ci.org/facebook/rocksdb/jobs/79591566

Test Plan:
make check

I will also verify that max_max_open_files is reasonable.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46551

ac9bcb55

03 9月, 2015 1 次提交

Unified maps with Comparator for sorting, other cleanup · 3c9cef1e

由 Andres Noetzli 提交于 9月 02, 2015

Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).

Test Plan: make clean check all

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45993

3c9cef1e

29 8月, 2015 1 次提交

Fix deadlock in WAL sync · effd9dd1

由 Andres Noetzli 提交于 8月 28, 2015

Summary:
MarkLogsSynced() was doing `logs_.erase(it++);`. The standard is saying:

```
all iterators and references are invalidated, unless the erased members are at an end (front or back) of the deque (in which case only iterators and references to the erased members are invalidated)
```

Because `it` is an iterator to the first element of the container, it is
invalidated, only one iteration is executed and `log.getting_synced = false;`
is not being done, so `while (logs_.front().getting_synced)` in `WriteImpl()`
is not terminating.

Test Plan: make db_bench && ./db_bench --benchmarks=fillsync

Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, yhchiang, sdong, tnovak

Reviewed By: tnovak

Subscribers: kolmike, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45807

effd9dd1

27 8月, 2015 1 次提交

ReadaheadRandomAccessFile -- userspace readahead · 5f4166c9

由 Igor Canadi 提交于 8月 26, 2015

Summary:
ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.

We add ReadaheadRandomAccessFile layer only when file is read during compactions.

D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.

Test Plan: make check

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45123

5f4166c9

26 8月, 2015 1 次提交

Fix compact_files_example · 09d982f9

由 Andres Notzli 提交于 8月 25, 2015

Summary:
See task #7983654. The example was triggering an assert in compaction job
because the compaction was not marked as manual. With this patch,
CompactionPicker::FormCompaction() marks compactions as manual. This patch
also fixes a couple of typos, adds optimistic_transaction_example to
.gitignore and librocksdb as a dependency for examples. Adding librocksdb as
a dependency makes sure that the examples are built with the latest changes
in librocksdb.

Test Plan: make clean && cd examples && make all && ./compact_files_example

Reviewers: rven, sdong, anthony, igor, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45117

09d982f9

25 8月, 2015 2 次提交

Fixing race condition in DBTest.DynamicMemtableOptions · 20508329

由 Andres Noetzli 提交于 8月 24, 2015

Summary:
This patch fixes a race condition in DBTEst.DynamicMemtableOptions. In rare cases,
it was possible that the main thread would fill up both memtables before the flush
job acquired its work. Then, the flush job was flushing both memtables together,
producing only one L0 file while the test expected two. Now, the test waits for
flushes to finish earlier, to make sure that the memtables are flushed in separate
flush jobs.

Test Plan:
Insert "usleep(10000);" after "IOSTATS_SET_THREAD_POOL_ID(Env::Priority::HIGH);" in BGWorkFlush()
to make the issue more likely. Then test with:
make db_test && time while ./db_test --gtest_filter=*DynamicMemtableOptions; do true; done

Reviewers: rven, sdong, yhchiang, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45429

20508329

Smarter purging during flush · 4ab26c5a

由 Igor Canadi 提交于 8月 24, 2015

Summary:
Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks.

This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point.

I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature.

Test Plan:
make check
I had to adjust some unit tests to understand this new behavior

Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli

Reviewed By: noetzli

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42087

4ab26c5a

20 8月, 2015 1 次提交

Rate limit deletes issued by DestroyDB · 3fd70b05

由 Islam AbdelRahman 提交于 8月 19, 2015

Summary: Update DestroyDB so that all SST files in the first path id go through DeleteScheduler instead of being deleted immediately

Test Plan: added a unittest

Reviewers: igor, yhchiang, anthony, kradhakrishnan, rven, sdong

Reviewed By: sdong

Subscribers: jeanxu2012, dhruba

Differential Revision: https://reviews.facebook.net/D44955

3fd70b05

19 8月, 2015 1 次提交

[Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State · f0da6977

由 Ari Ekmekji 提交于 8月 18, 2015

Summary:
In prepration for running multiple threads at the same time during
a compaction job, this patch assigns each subcompaction its own state
(instead of sharing the one global CompactionState). Each subcompaction then
uses this state to update its statistics, keep track of its snapshots, etc.
during the course of execution. Then at the end of all the executions the
statistics are aggregated across the subcompactions so that the final result
is the same as if only one larger compaction had run.

Test Plan: ./db_test  ./db_compaction_test  ./compaction_job_test

Reviewers: sdong, anthony, igor, noetzli, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43239

f0da6977

15 8月, 2015 2 次提交

Measure file read latency histogram per level · 72613657

由 sdong 提交于 8月 13, 2015

Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.

Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected

Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44193

72613657

reduce db mutex contention for write batch groups · b7198c3a

由 Nathan Bronson 提交于 8月 05, 2015

Summary:
This diff allows a Writer to join the next write batch group
without acquiring any locks. Waiting is performed via a per-Writer mutex,
so all of the non-leader writers never need to acquire the db mutex.
It is now possible to join a write batch group after the leader has been
chosen but before the batch has been constructed. This diff doesn't
increase parallelism, but reduces synchronization overheads.

For some CPU-bound workloads (no WAL, RAM-sized working set) this can
substantially reduce contention on the db mutex in a multi-threaded
environment. With T=8 N=500000 in a CPU-bound scenario (see the test
plan) this is good for a 33% perf win. Not all scenarios see such a
win, but none show a loss. This code is slightly faster even for the
single-threaded case (about 2% for the CPU-bound scenario below).

Test Plan:
1. unit tests
2. COMPILE_WITH_TSAN=1 make check
3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000

Reviewers: sdong, igor, rven, ljin, yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43887

b7198c3a

14 8月, 2015 1 次提交

Add options.compaction_measure_io_stats to print write I/O stats in compactions · 603b6da8

由 sdong 提交于 8月 12, 2015

Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:

2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}

Add two more counters in iostats_context.

Also add a parameter of db_bench.

Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44115

603b6da8

12 8月, 2015 2 次提交

Transaction error statuses · 0db807ec

由 agiardullo 提交于 7月 30, 2015

Summary:
Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures.

https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954

Test Plan: unit tests

Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43323

0db807ec

Pessimistic Transactions · c2f2cb02

由 agiardullo 提交于 5月 25, 2015

Summary:
Initial implementation of Pessimistic Transactions. This diff contains the api changes discussed in D38913. This diff is pretty large, so let me know if people would prefer to meet up to discuss it.

MyRocks folks: please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.

Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint(). After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex. We can then decide which route is preferable.

Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.

Test Plan: Unit tests, db_bench parallel testing.

Reviewers: igor, rven, sdong, yhchiang, yoshinorim

Reviewed By: sdong

Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40869

c2f2cb02

07 8月, 2015 1 次提交

Avoid type unique_ptr in LogWriterNumber::writer for Windows build break · 6a4aaadc

由 sdong 提交于 8月 05, 2015

Summary:
Visual Studio complains about deque<LogWriterNumber> because LogWriterNumber is non-copyable for its unique_ptr member writer. Move away from it, and do explit free.
It is less safe but I can't think of a better way to unblock it.

Test Plan: valgrind check test

Reviewers: anthony, IslamAbdelRahman, kolmike, rven, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43647

6a4aaadc

05 8月, 2015 2 次提交

[wal changes 3/3] method in DB to sync WAL without blocking writers · e06cf1a0

由 Mike Kolupaev 提交于 8月 05, 2015

Summary:
Subj. We really need this feature.

Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.

Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.

Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba

Differential Revision: https://reviews.facebook.net/D40905

e06cf1a0

Support delete rate limiting · c45a57b4

由 Islam AbdelRahman 提交于 8月 04, 2015

Summary:
Introduce DeleteScheduler that allow enforcing a rate limit on file deletion
Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed.

I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile

Test Plan:
added delete_scheduler_test
existing unit tests

Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43221

c45a57b4

28 7月, 2015 1 次提交

Fix when output level is 0 of universal compaction with trivial move · 1bdfcef7

由 Poornima Chozhiyath Raman 提交于 7月 27, 2015

Summary: Fix for universal compaction with trivial move, when the ouput level is 0. The tests where failing. Fixed by allowing normal compaction when output level is 0.

Test Plan: modified test cases run successfully.

Reviewers: sdong, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: anthony, kradhakrishnan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42933

1bdfcef7

22 7月, 2015 1 次提交

[wal changes 2/3] write with sync=true syncs previous unsynced wals to prevent illegal data loss · fe09a6da

由 Mike Kolupaev 提交于 7月 22, 2015

Summary:
I'll just copy internal task summary here:

"
This sequence will cause data loss in the middle after an sync write:

non-sync write key 1
flush triggered, not yet scheduled
sync write key 2
system crash

After rebooting, users might see key 2 but not key 1, which violates the API of sync write.

This can be reproduced using unit test FaultInjectionTest::DISABLED_WriteOptionSyncTest.

One way to fix it is for a sync write, if there is outstanding unsynced log files, we need to syc them too.
"

This diff should be considered together with the next diff D40905; in isolation this fix probably could be a little simpler.

Test Plan: `make check`; added a test for that (DBTest.SyncingPreviousLogs) before noticing FaultInjectionTest.WriteOptionSyncTest (keeping both since mine asserts a bit more); both tests fail without this diff; for D40905 stacked on top of this diff, ran tests with ASAN, TSAN and valgrind

Reviewers: rven, yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D40899

fe09a6da

21 7月, 2015 1 次提交

Improved FileExists API · 06429408

由 agiardullo 提交于 7月 20, 2015

Summary: Add new CheckFileExists method.  Considered changing the FileExists api but didn't want to break anyone's builds.

Test Plan: unit tests

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42003

06429408

18 7月, 2015 3 次提交

Move rate_limiter, write buffering, most perf context instrumentation and most... · 6e9fbeb2

由 sdong 提交于 7月 17, 2015

Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env

Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.

Test Plan: Run all existing unit tests.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42321

6e9fbeb2

Don't let flushes preempt compactions · 35ca5936

由 Igor Canadi 提交于 7月 17, 2015

Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.

We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.

Test Plan: make check (there might be some unit tests that depend on this behavior)

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41931

35ca5936

Deprecate CompactionFilterV2 · a96fcd09

由 Igor Canadi 提交于 7月 17, 2015

Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature.

Test Plan: make check

Reviewers: noetzli, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42405

a96fcd09

17 7月, 2015 1 次提交

Fix data loss after DB recovery by not allowing flush/compaction to be scheduled until DB opened · 6c0c8dee

由 sdong 提交于 7月 15, 2015

Summary:
Previous run may leave some SST files with higher file numbers than manifest indicates.
Compaction or flush may start to run while DB::Open() is still going on. SST file garbage collection may happen interleaving with compaction or flush, and overwrite files generated by compaction of flushes after they are generated. This might cause data loss. This possibility of interleaving is recently introduced.
Fix it by not allowing compaction or flush to be scheduled before DB::Open() finishes.

Test Plan: Add a unit test. This verification will have a chance to fail without the fix but doesn't fix without the fix.

Reviewers: kradhakrishnan, anthony, yhchiang, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42399

6c0c8dee

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功