提交 · 9053fe2a5c9b1c4b8ffdfbc59f3855360d97b0ec · kvdb / rocksdb

06 12月, 2016 1 次提交

Made delete_obsolete_files_period_micros option dynamic · 9053fe2a

由 Anton Safonov 提交于 12月 05, 2016

Summary:
Made delete_obsolete_files_period_micros option dynamic. It can be updating using DB::SetDBOptions().
Closes https://github.com/facebook/rocksdb/pull/1595

Differential Revision: D4246569

Pulled By: tonek

fbshipit-source-id: d23f560

9053fe2a

02 12月, 2016 6 次提交

fix clang build · edde954e

由 Islam AbdelRahman 提交于 12月 01, 2016

Summary:
override is missing for FilterV2
Closes https://github.com/facebook/rocksdb/pull/1606

Differential Revision: D4263832

Pulled By: IslamAbdelRahman

fbshipit-source-id: d8b337a

edde954e

Add memtable_insert_with_hint_prefix_size option to db_bench · 56281f3a

由 Yi Wu 提交于 12月 01, 2016

Summary:
Add memtable_insert_with_hint_prefix_size option to db_bench
Closes https://github.com/facebook/rocksdb/pull/1604

Differential Revision: D4260549

Pulled By: yiwu-arbug

fbshipit-source-id: cee5ef7

56281f3a

Cache heap::downheap() root comparison (optimize heap cmp call) · 4a21b140

由 Islam AbdelRahman 提交于 12月 01, 2016

Summary:
Reduce number of comparisons in heap by caching which child node in the first level is smallest (left_child or right_child)
So next time we can compare directly against the smallest child

I see that the total number of calls to comparator drops significantly when using this optimization

Before caching (~2mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000  --perf_level=2
readseq      :       0.338 micros/op 2959201 ops/sec;  327.4 MB/s user_key_comparison_count = 2000008
```
After caching (~1mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000 --perf_level=2
readseq      :       0.309 micros/op 3236801 ops/sec;  358.1 MB/s user_key_comparison_count = 1000011
```

It also improves
Closes https://github.com/facebook/rocksdb/pull/1600

Differential Revision: D4256027

Pulled By: IslamAbdelRahman

fbshipit-source-id: 76fcc66

4a21b140

Fix travis (compile for clang < 3.9) · e39d0808

由 Islam AbdelRahman 提交于 12月 01, 2016

Summary:
Travis fail because it uses clang 3.6 which don't recognize
`__attribute__((__no_sanitize__("undefined")))`
Closes https://github.com/facebook/rocksdb/pull/1601

Differential Revision: D4257175

Pulled By: IslamAbdelRahman

fbshipit-source-id: fb4d1ab

e39d0808

Kill flashcache code in RocksDB · 3f407b06

由 Igor Canadi 提交于 12月 01, 2016

Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72

3f407b06

Bug: paralle_group status updated in WriteThread::CompleteParallelWorker · b77007df

由 fangchenliaohui 提交于 12月 01, 2016

Summary:
Multi-write thread may update the status of the parallel_group in
WriteThread::CompleteParallelWorker if the status of Writer is not ok!
When copy write status to the paralle_group, the write thread just hold the
mutex of the the writer processed by itself. it is useless. The thread
should held the the leader of the parallel_group instead.
Closes https://github.com/facebook/rocksdb/pull/1598

Differential Revision: D4252335

Pulled By: siying

fbshipit-source-id: 3864cf7

b77007df

01 12月, 2016 3 次提交

Support for range skips in compaction filter · 247d0979

由 Mike Kolupaev 提交于 12月 01, 2016

Summary:
This adds the ability for compaction filter to say "drop this key-value, and also drop everything up to key x". This will cause the compaction to seek input iterator to x, without reading the data. This can make compaction much faster when large consecutive chunks of data are filtered out. See the changes in include/rocksdb/compaction_filter.h for the new API.

Along the way this diff also adds ability for compaction filter changing merge operands, similar to how it can change values; we're not going to use this feature, it just seemed easier and cleaner to implement it than to document that it's not implemented :)

The diff is not as big as it may seem, about half of the lines are a test.
Closes https://github.com/facebook/rocksdb/pull/1599

Differential Revision: D4252092

Pulled By: al13n321

fbshipit-source-id: 41e1e48

247d0979

c api: expose option for dynamic level size target · 96fcefbf

由 Panagiotis Ktistakis 提交于 11月 30, 2016

Summary: Closes https://github.com/facebook/rocksdb/pull/1587

Differential Revision: D4245923

Pulled By: yiwu-arbug

fbshipit-source-id: 6ee7291

96fcefbf

Add C API to set base_backgroud_compactions · 00197cff

由 zhangjinpeng1987 提交于 11月 30, 2016

Summary:
Add C API to set base_backgroud_compactions
Closes https://github.com/facebook/rocksdb/pull/1571

Differential Revision: D4245709

Pulled By: yiwu-arbug

fbshipit-source-id: 792c6b8

00197cff

30 11月, 2016 6 次提交

deleterange end-to-end test improvements for lite/robustness · 5b219ecc

由 Andrew Kryczka 提交于 11月 29, 2016

Summary: Closes https://github.com/facebook/rocksdb/pull/1591

Differential Revision: D4246019

Pulled By: ajkr

fbshipit-source-id: 0c4aa37

5b219ecc

pass rocksdb oncall to mysql_mtr_filter otherwise tasks get created w… · aad11917

由 Anirban Rahut 提交于 11月 29, 2016

Summary:
…rong owner

 mysql_mtr_filter script needs proper oncall
Closes https://github.com/facebook/rocksdb/pull/1586

Differential Revision: D4245150

Pulled By: anirbanr-fb

fbshipit-source-id: fd8577c

aad11917

DeleteRange write path end-to-end tests · e3335289

由 Andrew Kryczka 提交于 11月 29, 2016

Summary: Closes https://github.com/facebook/rocksdb/pull/1578

Differential Revision: D4241171

Pulled By: ajkr

fbshipit-source-id: ce5fd83

e3335289

Fix mis-reporting of compaction read bytes to the base level · 7784980f

由 Siying Dong 提交于 11月 29, 2016

Summary:
In dynamic leveled compaction, when calculating read bytes, output level bytes may be wronglyl calculated as input level inputs. Fix it.
Closes https://github.com/facebook/rocksdb/pull/1475

Differential Revision: D4148412

Pulled By: siying

fbshipit-source-id: f2f475a

7784980f

Fix implicit conversion between int64_t to int · 3c6b49ed

由 Islam AbdelRahman 提交于 11月 29, 2016

Summary:
Make conversion explicit, implicit conversion breaks the build
Closes https://github.com/facebook/rocksdb/pull/1589

Differential Revision: D4245158

Pulled By: IslamAbdelRahman

fbshipit-source-id: aaec00d

3c6b49ed

Remove unused assignment in db/db_iter.cc · b3b87565

由 Siying Dong 提交于 11月 29, 2016

Summary:
"make analyze" complains the assignment is not useful. Remove it.
Closes https://github.com/facebook/rocksdb/pull/1581

Differential Revision: D4241697

Pulled By: siying

fbshipit-source-id: 178f67a

b3b87565

29 11月, 2016 9 次提交

Fix range deletion covering key in same SST file · 4f6e89b1

由 Andrew Kryczka 提交于 11月 28, 2016

Summary:
AddTombstones() needs to be before t->Get(), oops :'(
Closes https://github.com/facebook/rocksdb/pull/1576

Differential Revision: D4241041

Pulled By: ajkr

fbshipit-source-id: 781ceea

4f6e89b1

Avoid intentional overflow in GetL0ThresholdSpeedupCompaction · a2bf265a

由 Islam AbdelRahman 提交于 11月 28, 2016

Summary:
https://github.com/facebook/rocksdb/commit/99c052a34f93d119b75eccdcd489ecd581d48ee9 fixes integer overflow in GetL0ThresholdSpeedupCompaction() by checking if int become -ve.
UBSAN will complain about that since this is still an overflow, we can fix the issue by simply using int64_t
Closes https://github.com/facebook/rocksdb/pull/1582

Differential Revision: D4241525

Pulled By: IslamAbdelRahman

fbshipit-source-id: b3ae21f

a2bf265a

disable UBSAN for functions with intentional -ve shift / overflow · 52fd1ff2

由 Islam AbdelRahman 提交于 11月 28, 2016

Summary:
disable UBSAN for functions with intentional left shift on -ve number / overflow

These functions are
rocksdb:: Hash
FixedLengthColBufEncoder::Append
FaultInjectionTest:: Key
Closes https://github.com/facebook/rocksdb/pull/1577

Differential Revision: D4240801

Pulled By: IslamAbdelRahman

fbshipit-source-id: 3e1caf6

52fd1ff2

Fix CompactionJob::Install division by zero · 1886c435

由 Islam AbdelRahman 提交于 11月 28, 2016

Summary:
Fix CompactionJob::Install division by zero
Closes https://github.com/facebook/rocksdb/pull/1580

Differential Revision: D4240794

Pulled By: IslamAbdelRahman

fbshipit-source-id: 7286721

1886c435

fix options_test ubsan · 63c30de8

由 Islam AbdelRahman 提交于 11月 28, 2016

Summary:
Having -ve value for max_write_buffer_number does not make sense and cause us to do a left shift on a -ve value number
Closes https://github.com/facebook/rocksdb/pull/1579

Differential Revision: D4240798

Pulled By: IslamAbdelRahman

fbshipit-source-id: bd6267e

63c30de8

Fix compaction_job.cc division by zero · 13e66a8f

由 Islam AbdelRahman 提交于 11月 28, 2016

Summary:
Fix division by zero in compaction_job.cc
Closes https://github.com/facebook/rocksdb/pull/1575

Differential Revision: D4240818

Pulled By: IslamAbdelRahman

fbshipit-source-id: a8bc757

13e66a8f

Fix double-counted deletion stat · 01eabf73

由 Andrew Kryczka 提交于 11月 28, 2016

Summary:
Both the single deletion and the value are included in compaction outputs, so no need to update the stat for the value's deletion yet, otherwise it'd be double-counted.
Closes https://github.com/facebook/rocksdb/pull/1574

Differential Revision: D4241181

Pulled By: ajkr

fbshipit-source-id: c9aaa15

01eabf73

DeleteRange compaction statistics · 7ffb10fc

由 Andrew Kryczka 提交于 11月 28, 2016

Summary:
- "rocksdb.compaction.key.drop.range_del" - number of keys dropped during compaction due to a range tombstone covering them
- "rocksdb.compaction.range_del.drop.obsolete" - number of range tombstones dropped due to compaction to bottom level and no snapshot saving them
- s/CompactionIteratorStats/CompactionIterationStats/g since this class is no longer specific to CompactionIterator -- it's also updated for range tombstone iteration during compaction
- Move the above class into a separate .h file to avoid circular dependency.
Closes https://github.com/facebook/rocksdb/pull/1520

Differential Revision: D4187179

Pulled By: ajkr

fbshipit-source-id: 10c2103

7ffb10fc

Less linear search in DBIter::Seek() when keys are overwritten a lot · 236d4c67

由 Mike Kolupaev 提交于 11月 28, 2016

Summary:
In one deployment we saw high latencies (presumably from slow iterator operations) and a lot of CPU time reported by perf with this stack:

```
  rocksdb::MergingIterator::Next
  rocksdb::DBIter::FindNextUserEntryInternal
  rocksdb::DBIter::Seek
```

I think what's happening is:
1. we create a snapshot iterator,
2. we do lots of Put()s for the same key x; this creates lots of entries in memtable,
3. we seek the iterator to a key slightly smaller than x,
4. the seek walks over lots of entries in memtable for key x, skipping them because of high sequence numbers.

CC IslamAbdelRahman
Closes https://github.com/facebook/rocksdb/pull/1413

Differential Revision: D4083879

Pulled By: IslamAbdelRahman

fbshipit-source-id: a83ddae

236d4c67

24 11月, 2016 1 次提交

Improve Write Stalling System · cd7c4143

由 Siying Dong 提交于 11月 23, 2016

Summary:
Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.

Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.
Closes https://github.com/facebook/rocksdb/pull/1562

Differential Revision: D4218519

Pulled By: siying

fbshipit-source-id: 95e4088

cd7c4143

23 11月, 2016 2 次提交

Unified InlineSkipList::Insert algorithm with hinting · dfb6fe67

由 Yi Wu 提交于 11月 22, 2016

Summary:
This PR is based on nbronson's diff with small
modifications to wire it up with existing interface. Comparing to
previous version, this approach works better for inserting keys in
decreasing order or updating the same key, and impose less restriction
to the prefix extractor.

---- Summary from original diff ----

This diff introduces a single InlineSkipList::Insert that unifies
the existing sequential insert optimization (prev_), concurrent insertion,
and insertion using externally-managed insertion point hints.

There's a deep symmetry between insertion hints (cursors) and the
concurrent algorithm.  In both cases we have partial information from
the recent past that is likely but not certain to be accurate.  This diff
introduces the struct InlineSkipList::Splice, which encodes predecessor
and successor information in the same form that was previously only used
within a single call to InsertConcurrently.  Splice holds information
about an insertion point that can be used to levera
Closes https://github.com/facebook/rocksdb/pull/1561

Differential Revision: D4217283

Pulled By: yiwu-arbug

fbshipit-source-id: 33ee437

dfb6fe67

Making persistent cache more resilient to filesystem failures · 3068870c

由 Karthikeyan Radhakrishnan 提交于 11月 22, 2016

Summary:
The persistent cache is designed to hop over errors and return key not found. So far, it has shown resilience to write errors, encoding errors, data corruption etc. It is not resilient against disappearing files/directories. This was exposed during testing when multiple instances of persistence cache was started sharing the same directory simulating an unpredictable filesystem environment.

This patch

- makes the write code path more resilient to errors while creating files
- makes the read code path more resilient to handle situation where files are not found
- added a test that does negative write/read testing by removing the directory while writes are in progress
Closes https://github.com/facebook/rocksdb/pull/1472

Differential Revision: D4143413

Pulled By: kradhakrishnan

fbshipit-source-id: fd25e9b

3068870c

22 11月, 2016 9 次提交

Eliminate redundant cache lookup with range deletion · 734e4aca

由 Andrew Kryczka 提交于 11月 21, 2016

Summary:
When we introduced range deletion block, TableCache::Get() and TableCache::NewIterator() each did two table cache lookups, one for range deletion block iterator and another for getting the table reader to which the Get()/NewIterator() is delegated. This extra cache lookup was very CPU-intensive (about 10% overhead in a read-heavy benchmark). We can avoid it by reusing the Cache::Handle created for range deletion block iterator to get the file reader.
Closes https://github.com/facebook/rocksdb/pull/1537

Differential Revision: D4201167

Pulled By: ajkr

fbshipit-source-id: d33ffd8

734e4aca

Add WriteOptions.no_slowdown · 182b940e

由 Maysam Yabandeh 提交于 11月 21, 2016

Summary:
If the WriteOptions.no_slowdown flag is set AND we need to wait or sleep for
the write request, then fail immediately with Status::Incomplete().
Closes https://github.com/facebook/rocksdb/pull/1527

Differential Revision: D4191405

Pulled By: maysamyabandeh

fbshipit-source-id: 7f3ce3f

182b940e

Persistent Cache: Expose stats to user via public API · 4118e133

由 Karthikeyan Radhakrishnan 提交于 11月 21, 2016

Summary:
Exposing persistent cache stats (counters) to the user via public API.
Closes https://github.com/facebook/rocksdb/pull/1485

Differential Revision: D4155274

Pulled By: siying

fbshipit-source-id: 30a9f50

4118e133

rocks_lua_compaction_filter: add unused attribute to a variable · f2a8f92a

由 Siying Dong 提交于 11月 21, 2016

Summary:
Release build shows warning without this fix.
Closes https://github.com/facebook/rocksdb/pull/1558

Differential Revision: D4215831

Pulled By: yiwu-arbug

fbshipit-source-id: 888a755

f2a8f92a

Remove use of deprecated LZ4 function · 4444256a

由 Nick Terrell 提交于 11月 21, 2016

Summary:
LZ4 1.7.3 emits warnings when calling the deprecated function `LZ4_compress_limitedOutput_continue()`. Starting in r129, LZ4 introduces `LZ4_compress_fast_continue()` as a replacement, and the two functions calls are [exactly equivalent](https://github.com/lz4/lz4/blob/dev/lib/lz4.c#L1408).
Closes https://github.com/facebook/rocksdb/pull/1532

Differential Revision: D4199240

Pulled By: siying

fbshipit-source-id: 138c2bc

4444256a

Fix fd leak when using direct IOs · 548d7fb2

由 Changli Gao 提交于 11月 21, 2016

Summary:
We should close the fd, before overriding it. This bug was
introduced by f89caa12
Closes https://github.com/facebook/rocksdb/pull/1553

Differential Revision: D4214101

Pulled By: siying

fbshipit-source-id: 0d65de0

548d7fb2

Range deletion microoptimizations · fd43ee09

由 Andrew Kryczka 提交于 11月 21, 2016

Summary:
- Made RangeDelAggregator's InternalKeyComparator member a reference-to-const so we don't need to copy-construct it. Also added InternalKeyComparator to ImmutableCFOptions so we don't need to construct one for each DBIter.
- Made MemTable::NewRangeTombstoneIterator and the table readers' NewRangeTombstoneIterator() functions return nullptr instead of NewEmptyInternalIterator to avoid the allocation. Updated callers accordingly.
Closes https://github.com/facebook/rocksdb/pull/1548

Differential Revision: D4208169

Pulled By: ajkr

fbshipit-source-id: 2fd65cf

fd43ee09

Reword support a little bit to more clear and concise · 23a18ca5

由 Joel Marcey 提交于 11月 21, 2016

Summary:
I tried to do this in #1556, but it landed before the change could be imported.
Closes https://github.com/facebook/rocksdb/pull/1557

Differential Revision: D4214572

Pulled By: siying

fbshipit-source-id: 718d4a4

23a18ca5

Update support to separate code issues with general questions · 481856ac

由 Joel Marcey 提交于 11月 21, 2016

Summary: Closes https://github.com/facebook/rocksdb/pull/1556

Differential Revision: D4214184

Pulled By: siying

fbshipit-source-id: c1abf47

481856ac

21 11月, 2016 1 次提交

Fix deadlock when calling getMergedHistogram · a0deec96

由 Changli Gao 提交于 11月 20, 2016

Summary:
When calling StatisticsImpl::HistogramInfo::getMergedHistogram(), if
there is a dying thread, which is calling
ThreadLocalPtr::StaticMeta::OnThreadExit() to merge its thread values to
HistogramInfo, deadlock will occur. Because the former try to hold
merge_lock then ThreadMeta::mutex_, but the later try to hold
ThreadMeta::mutex_ then merge_lock. In short, the locking order isn't
the same.

This patch addressed this issue by releasing merge_lock before folding
thread values.
Closes https://github.com/facebook/rocksdb/pull/1552

Differential Revision: D4211942

Pulled By: ajkr

fbshipit-source-id: ef89bcb

a0deec96

20 11月, 2016 2 次提交

Remove Arena in RangeDelAggregator · fe349db5

由 Andrew Kryczka 提交于 11月 19, 2016

Summary:
The Arena construction/destruction introduced significant overhead to read-heavy workload just by creating empty vectors for its blocks, so avoid it in RangeDelAggregator.
Closes https://github.com/facebook/rocksdb/pull/1547

Differential Revision: D4207781

Pulled By: ajkr

fbshipit-source-id: 9d1c130

fe349db5

Use more efficient hash map for deadlock detection · e63350e7

由 Manuel Ung 提交于 11月 19, 2016

Summary:
Currently, deadlock cycles are held in std::unordered_map. The problem with it is that it allocates/deallocates memory on every insertion/deletion. This limits throughput since we're doing this expensive operation while holding a global mutex. Fix this by using a vector which caches memory instead.

Running the deadlock stress test, this change increased throughput from 39k txns/s -> 49k txns/s. The effect is more noticeable in MyRocks.
Closes https://github.com/facebook/rocksdb/pull/1545

Differential Revision: D4205662

Pulled By: lth

fbshipit-source-id: ff990e4

e63350e7

kvdb / rocksdb 10 个月 前同步成功

kvdb / rocksdb
10 个月前同步成功