提交 · 5b9ce1a32301ff826efb0d6b54ca81e132e48cde · kvdb / rocksdb

07 11月, 2015 1 次提交

Enable Windows warnings C4307 C4309 C4512 C4701 · 20f57b17

由 Dmitri Smirnov 提交于 11月 06, 2015

  Enable C4307 'operator' : integral constant overflow
  Longs and ints on Windows are 32-bit hence the overflow
  Enable C4309 'conversion' : truncation of constant value
  Enable C4512 'class' : assignment operator could not be generated
  Enable C4701 Potentially uninitialized local variable 'name' used

20f57b17

06 11月, 2015 1 次提交

Prefix-based iterating only shows keys in prefix · 9d50afc3

由 Venkatesh Radhakrishnan 提交于 11月 05, 2015

Summary:
MyRocks testing found an issue that while iterating over keys
that are outside the prefix, sometimes wrong results were seen for keys
outside the prefix. We now tighten the range of keys seen with a new
read option called prefix_seen_at_start. This remembers the starting
prefix and then compares it on a Next for equality of prefix. If they
are from a different prefix, it sets valid to false.

Test Plan: PrefixTest.PrefixValid

Reviewers: IslamAbdelRahman, sdong, yhchiang, anthony

Reviewed By: anthony

Subscribers: spetrunia, hermanlee4, yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50211

9d50afc3

31 10月, 2015 1 次提交
- S
  
  Move skip_table_builder_flush to BlockBasedTableOption · ccc8c10c
  由 SherlockNoMad 提交于 10月 30, 2015
  
  ccc8c10c
30 10月, 2015 2 次提交

S

Add Option to Skip Flushing in TableBuilder · a6dd0831
由 SherlockNoMad 提交于 10月 29, 2015

a6dd0831

"make format" in some recent commits · 296c3a1f

由 sdong 提交于 10月 29, 2015

Summary: Run "make format" for some recent commits.

Test Plan: Build and run tests

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49707

296c3a1f

28 10月, 2015 1 次提交

Implement smart buffer management. · 6fbc4f9f

由 Dmitri Smirnov 提交于 10月 27, 2015

  introduce a new DBOption random_access_max_buffer_size to limit
  the size of the random access buffer used for unbuffered access.
  Implement read ahead buffering when enabled.
  To that effect propagate compaction_readahead_size and the new option
  to the env options to make it available for the implementation.
  Add Hint() override so SetupForCompaction() call would call Hint()
  readahead can now be setup from both Hint() and EnableReadAhead()
  Add new option random_access_max_buffer_size support
  db_bench, options_helper to make it string parsable
  and the unit test.

6fbc4f9f

20 10月, 2015 1 次提交
- P
  Handle multiple batches in single log record - allow app to return a new batch... · 0c59691d
  由 Praveen Rao 提交于 10月 19, 2015
```
Handle multiple batches in single log record - allow app to return a new batch + allow app to return corrupted record status
```
  0c59691d
19 10月, 2015 1 次提交
- S
  options: add recycle_log_file_num option · 543c12ab
  由 Sage Weil 提交于 10月 07, 2015
```
Signed-off-by: NSage Weil <sage@redhat.com>
```
  543c12ab
14 10月, 2015 1 次提交
- P
  
  Put wal_filter under #ifndef ROCKSDB_LITE · cc4d13e0
  由 Praveen Rao 提交于 10月 13, 2015
  
  cc4d13e0
13 10月, 2015 1 次提交
- P
  
  Adding log filter to inspect and filter log records on recovery · 59a0c219
  由 Praveen Rao 提交于 10月 12, 2015
  
  59a0c219
08 10月, 2015 1 次提交

Added boolean variable to guard fallocate() calls · 4049bcde

由 Lakshmi Narayanan 提交于 10月 07, 2015

Summary:
Added boolean variable to guard fallocate() calls.
Set to false to prevent space leaks when tests fail.

Test Plan:
Compliles
Set to false and ran log device tests

Reviewers: sdong, lovro, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48027

4049bcde

22 9月, 2015 1 次提交

Add a mode to always pick the oldest file to compact for each level · f1b9f804

由 sdong 提交于 9月 21, 2015

Summary:
Add options.compaction_pri, which specifies the policy about which file to compact first.
kCompactionPriByLargestSeq will compact oldest files first.
Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable.

Test Plan: Will write unit tests

Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45951

f1b9f804

17 9月, 2015 1 次提交

Change the log level of DB start-up log from Warn to Header. · f21c7415

由 Yueh-Hsuan Chiang 提交于 9月 16, 2015

Summary: Change the log level of DB start-up log from Warn to Header.

Test Plan: db_bench and observe the LOG header

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47067

f21c7415

16 9月, 2015 1 次提交

Add DBOption.max_subcompaction to option dump · 2b683d49

由 Ari Ekmekji 提交于 9月 15, 2015

Summary:
RocksDB options can be dumped to the log file, and
up to this point the max_subcompactions option was not included
in this dump. This fixes that.

Test Plan: makek all && make check

Reviewers: MarkCallaghan, igor, noetzli, anthony, yhchiang, sdong

Reviewed By: yhchiang, sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46971

2b683d49

15 9月, 2015 2 次提交

Fix valgrind error · e467bf0d

由 Islam AbdelRahman 提交于 9月 14, 2015

Summary:
Valgrind is complaining because we are using hard_rate_limit (when serializing the options) without being initialized
http://our.intern.facebook.com/intern/sandcastle/3962140295/77533971/

Test Plan: run the test under valgrind

Reviewers: kradhakrishnan, yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46929

e467bf0d

Add options.hard_pending_compaction_bytes_limit to stop writes if compaction lagging behind · 5de807ac

由 sdong 提交于 9月 11, 2015

Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold.

Test Plan: Add unit test DBTest.HardLimit

Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45999

5de807ac

27 8月, 2015 1 次提交

ReadaheadRandomAccessFile -- userspace readahead · 5f4166c9

由 Igor Canadi 提交于 8月 26, 2015

Summary:
ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.

We add ReadaheadRandomAccessFile layer only when file is read during compactions.

D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.

Test Plan: make check

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45123

5f4166c9

22 8月, 2015 1 次提交

Changed 'num_subcompactions' to the more accurate 'max_subcompactions' · b6def58f

由 Ari Ekmekji 提交于 8月 21, 2015

Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.

I also added a --subcompactions option to db_bench

Test Plan: make all   make check

Reviewers: sdong, igor, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D45069

b6def58f

21 8月, 2015 1 次提交

Add options.new_table_reader_for_compaction_inputs · 9130873a

由 sdong 提交于 8月 20, 2015

Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.

Test Plan: Add the option.

Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang

Reviewed By: igor

Subscribers: igor, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43311

9130873a

14 8月, 2015 1 次提交

Add options.compaction_measure_io_stats to print write I/O stats in compactions · 603b6da8

由 sdong 提交于 8月 12, 2015

Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:

2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}

Add two more counters in iostats_context.

Also add a parameter of db_bench.

Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44115

603b6da8

12 8月, 2015 1 次提交

Parallelize LoadTableHandlers · cee1e8a0

由 Islam AbdelRahman 提交于 8月 11, 2015

Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover

Test Plan:
make check -j64
COMPILE_WITH_TSAN=1 make check -j64
DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running)

Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43755

cee1e8a0

11 8月, 2015 1 次提交

Adding wal_recovery_mode log message · 2cf0f4f4

由 Yoshinori Matsunobu 提交于 8月 10, 2015

Summary:
wal_recovery_mode setting was not written to LOG. This diff
adds the log message

Test Plan: manually checked

Reviewers: kradhakrishnan, sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43953

2cf0f4f4

05 8月, 2015 2 次提交

Support delete rate limiting · c45a57b4

由 Islam AbdelRahman 提交于 8月 04, 2015

Summary:
Introduce DeleteScheduler that allow enforcing a rate limit on file deletion
Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed.

I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile

Test Plan:
added delete_scheduler_test
existing unit tests

Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43221

c45a57b4

Add DBOptions::skip_sats_update_on_db_open · 14d0bfa4

由 Yueh-Hsuan Chiang 提交于 8月 04, 2015

Summary:
UpdateAccumulatedStats() is used to optimize compaction decision
esp. when the number of deletion entries are high, but this function
can slowdown DBOpen esp. in disk environment.

This patch adds DBOptions::skip_sats_update_on_db_open, which skips
UpdateAccumulatedStats() in DB::Open() time when it's set to true.

Test Plan: Add DBCompactionTest.SkipStatsUpdateTest

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: tnovak, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42843

14d0bfa4

04 8月, 2015 1 次提交

Parallelize L0-L1 Compaction: Restructure Compaction Job · 40c64434

由 Ari Ekmekji 提交于 8月 03, 2015

Summary:
As of now compactions involving files from Level 0 and Level 1 are single
threaded because the files in L0, although sorted, are not range partitioned like
the other levels. This means that during L0-L1 compaction each file from L1
needs to be merged with potentially all the files from L0.

This attempt to parallelize the L0-L1 compaction assigns a thread and a
corresponding iterator to each L1 file that then considers only the key range
found in that L1 file and only the L0 files that have those keys (and only the
specific portion of those L0 files in which those keys are found). In this way
the overlap is minimized and potentially eliminated between different iterators
focusing on the same files.

The first step is to restructure the compaction logic to break L0-L1 compactions
into multiple, smaller, sequential compactions. Eventually each of these smaller
jobs will be run simultaneously. Areas to pay extra attention to are

  # Correct aggregation of compaction job statistics across multiple threads
  # Proper opening/closing of output files (make sure each thread's is unique)
  # Keys that span multiple L1 files
  # Skewed distributions of keys within L0 files

Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test

Reviewers: igor, noetzli, anthony, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42699

40c64434

18 7月, 2015 2 次提交

Don't let flushes preempt compactions · 35ca5936

由 Igor Canadi 提交于 7月 17, 2015

Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.

We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.

Test Plan: make check (there might be some unit tests that depend on this behavior)

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41931

35ca5936

Deprecate CompactionFilterV2 · a96fcd09

由 Igor Canadi 提交于 7月 17, 2015

Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature.

Test Plan: make check

Reviewers: noetzli, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42405

a96fcd09

14 7月, 2015 2 次提交

Deprecate purge_redundant_kvs_while_flush · a9c51095

由 Igor Canadi 提交于 7月 14, 2015

Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.

Test Plan: none

Reviewers: sdong, yhchiang, anthony, dhruba

Reviewed By: dhruba

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42063

a9c51095

"make format" against last 10 commits · f9728640

由 sdong 提交于 7月 13, 2015

Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more.

Test Plan: Build it.

Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41961

f9728640

02 7月, 2015 2 次提交

Address GCC compilation issues · ca2fe2c1

由 Dmitri Smirnov 提交于 7月 01, 2015

 invalid suffix on literal
 no return statement in function returning non-void CuckooStep::operator=
 extra qualification ‘rocksdb::spatial::Variant::
 dereferencing type-punned pointer will break strict-aliasing rules

ca2fe2c1

Windows Port from Microsoft · 18285c1e

由 Dmitri Smirnov 提交于 7月 01, 2015

 Summary: Make RocksDb build and run on Windows to be functionally
 complete and performant. All existing test cases run with no
 regressions. Performance numbers are in the pull-request.

 Test plan: make all of the existing unit tests pass, obtain perf numbers.

 Co-authored-by: Praveen Rao praveensinghrao@outlook.com
 Co-authored-by: Sherlock Huang baihan.huang@gmail.com
 Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
 Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com

18285c1e

24 6月, 2015 1 次提交

Implement a table-level row cache · 782a1590

由 Giuseppe Ottaviano 提交于 6月 23, 2015

Summary:
Implementation of a table-level row cache.
It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache.

Supports snapshots and merge operations.

Test Plan: Ran `make valgrind_check commit-prereq`

Reviewers: igor, philipp, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D39849

782a1590

23 6月, 2015 1 次提交

Introduce WAL recovery consistency levels · de85e4ca

由 krad 提交于 6月 15, 2015

Summary:
The "one size fits all" approach with WAL recovery will only introduce inconvenience for our varied clients as we go forward. The current recovery is a bit heuristic. We introduce the following levels of consistency while replaying the WAL.

1. RecoverAfterRestart (kTolerateCorruptedTailRecords)

This mocks the current recovery mode.

2. RecoverAfterCleanShutdown (kAbsoluteConsistency)

This is ideal for unit test and cases where the store is shutdown cleanly. We tolerate no corruption or incomplete writes.

3. RecoverPointInTime (kPointInTimeRecovery)

This is ideal when using devices with controller cache or file systems which can loose data on restart. We recover upto the point were is no corruption or incomplete write.

4. RecoverAfterDisaster (kSkipAnyCorruptRecord)

This is ideal mode to recover data. We tolerate corruption and incomplete writes, and we hop over those sections that we cannot make sense of salvaging as many records as possible.

Test Plan:
(1) Run added unit test to cover all levels.
(2) Run make check.

Reviewers: leveldb, sdong, igor

Subscribers: yoshinorim, dhruba

Differential Revision: https://reviews.facebook.net/D38487

de85e4ca

19 6月, 2015 2 次提交

Fail DB::Open() when the requested compression is not available · 760e9a94

由 Igor Canadi 提交于 6月 18, 2015

Summary:
Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users.

This patch fails DB::Open() if we don't support the compression that is specified in the options.

Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails.

Reviewers: sdong, MarkCallaghan, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39687

760e9a94

Don't dump DBOptions for each column family · 4b8bb62f

由 Igor Canadi 提交于 6月 18, 2015

Summary: Currently we dump DBOptions for each column family options we dump. This leads to duplicate lines in our LOG file. This diff fixes that.

Test Plan: Check out the LOG

Reviewers: sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: IslamAbdelRahman, yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39729

4b8bb62f

12 6月, 2015 1 次提交

Slow down writes by bytes written · 7842920b

由 sdong 提交于 5月 15, 2015

Summary:
We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch.

The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work

hard_rate_limit is deprecated.

options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up.

Test Plan: Add new unit tests in db_test

Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor

Reviewed By: igor

Subscribers: ikabiljo, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36351

7842920b

09 6月, 2015 1 次提交

Use nullptr for default compaction_filter_factory · 643bbbf0

由 Islam AbdelRahman 提交于 6月 08, 2015

Summary:
Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2
The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI

Test Plan: make check

Reviewers: yoshinorim, ott, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D39693

643bbbf0

30 5月, 2015 1 次提交

Include EventListener in stress test. · 9ffc8ba0

由 Yueh-Hsuan Chiang 提交于 5月 29, 2015

Summary: Include EventListener in stress test.

Test Plan: make blackbox_crash_test whitebox_crash_test

Reviewers: anthony, igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39105

9ffc8ba0

29 5月, 2015 2 次提交

Support saving history in memtable_list · c8153510

由 agiardullo 提交于 5月 28, 2015

Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443

c8153510

[API Change] Move listeners from ColumnFamilyOptions to DBOptions · 672dda9b

由 Yueh-Hsuan Chiang 提交于 5月 28, 2015

Summary: Move listeners from ColumnFamilyOptions to DBOptions

Test Plan:
listener_test
compact_files_test

Reviewers: rven, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39087

672dda9b

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功