提交 · df7004766989ae00e1681c369b1de8d5d9f74107 · kvdb / rocksdb

01 5月, 2014 3 次提交

由 Igor Canadi 提交于 4月 30, 2014

Summary:
Added a new option `max_total_wal_size`. Once the total WAL size goes over that, we make an attempt to flush all column families that still have data in the earliest WAL file.

By default, I calculate `max_total_wal_size` dynamically, that should be good-enough for non-advanced customers.

Test Plan: Added a test

Reviewers: dhruba, haobo, sdong, ljin, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18345

df700476

Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB · 7dafa3a1

由 sdong 提交于 4月 25, 2014

Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: nkg-, dhruba, leveldb, igor, yhchiang

Differential Revision: https://reviews.facebook.net/D18357

7dafa3a1

I

Some fixes as preparation for release · 66f88c43
由 Igor Canadi 提交于 4月 30, 2014

66f88c43

30 4月, 2014 5 次提交

I

More s/us fixes · d6d67c0e
由 Igor Canadi 提交于 4月 29, 2014

d6d67c0e

Add a new mem-table representation based on cuckoo hash. · 9d9d2965

由 Yueh-Hsuan Chiang 提交于 4月 29, 2014

Summary:
= Major Changes =
* Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash.
  Cuckoo hash uses multiple hash functions.  This allows each key to have multiple
  possible locations in the mem-table.

  - Put: When insert a key, it will try to find whether one of its possible
    locations is vacant and store the key.  If none of its possible
    locations are available, then it will kick out a victim key and
    store at that location.  The kicked-out victim key will then be
    stored at a vacant space of its possible locations or kick-out
    another victim.  In this diff, the kick-out path (known as
    cuckoo-path) is found using BFS, which guarantees to be the shortest.

 - Get: Simply tries all possible locations of a key --- this guarantees
   worst-case constant time complexity.

 - Time complexity: O(1) for Get, and average O(1) for Put if the
   fullness of the mem-table is below 80%.

 - Default using two hash functions, the number of hash functions used
   by the cuckoo-hash may dynamically increase if it fails to find a
   short-enough kick-out path.

 - Currently, HashCuckooRep does not support iteration and snapshots,
   as our current main purpose of this is to optimize point access.

= Minor Changes =
* Add IsSnapshotSupported() to DB to indicate whether the current DB
  supports snapshots.  If it returns false, then DB::GetSnapshot() will
  always return nullptr.

Test Plan:
Run existing tests.  Will develop a test specifically for cuckoo hash in
the next diff.

Reviewers: sdong, haobo

Reviewed By: sdong

CC: leveldb, dhruba, igor

Differential Revision: https://reviews.facebook.net/D16155

9d9d2965

I

More unsigned/signed compare fixes · f1c9aa6e
由 Igor Canadi 提交于 4月 29, 2014

f1c9aa6e
I

Fix more signed/unsigned comparsions · 38693d99
由 Igor Canadi 提交于 4月 29, 2014

38693d99

Cache result of ReadFirstRecord() · dd9eb7a7

由 Igor Canadi 提交于 4月 29, 2014

Summary:
ReadFirstRecord() reads the actual log file from disk on every call. This diff introduces a cache layer on top of ReadFirstRecord(), which should significantly speed up repeated calls to GetUpdatesSince().

I also cleaned up some stuff, but the whole TransactionLogIterator could use some refactoring, especially if we see increased usage.

Test Plan: make check

Reviewers: haobo, sdong, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18387

dd9eb7a7

29 4月, 2014 2 次提交

I

Use new DBWithTTL API in tests · 91ef2eae
由 Igor Canadi 提交于 4月 28, 2014

91ef2eae

Fix TransactionLogIterator EOF caching · 72ff275e

由 Igor Canadi 提交于 4月 28, 2014

Summary:
When TransactionLogIterator comes to EOF, it calls UnmarkEOF and continues reading. However, if glibc cached the EOF status of the file, it will get EOF again, even though the new data might have been written to it.

This has been causing errors in Mac OS.

Test Plan: test passes, was failing before

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18381

72ff275e

28 4月, 2014 1 次提交
- D
  
  Add rocksdb_open_for_read_only to C API · 4f9fae9b
  由 Donovan Hide 提交于 4月 27, 2014
  
  4f9fae9b
27 4月, 2014 1 次提交
- I
  
  Fix OSX compile · c489499a
  由 Igor Canadi 提交于 4月 26, 2014
  
  c489499a
26 4月, 2014 4 次提交

avoid calling FindFile twice in TwoLevelIterator for PlainTable · ccaca59b

由 Lei Jin 提交于 4月 25, 2014

Summary:
this is to reclaim the regression introduced in
https://reviews.facebook.net/D17853

Test Plan: make all check

Reviewers: igor, haobo, sdong, dhruba, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17985

ccaca59b

Check PrefixMayMatch on Seek() · d642c60b

由 Lei Jin 提交于 4月 25, 2014

Summary:
As a follow-up diff for https://reviews.facebook.net/D17805, add
optimization to check PrefixMayMatch on Seek()

Test Plan: make all check

Reviewers: igor, haobo, sdong, yhchiang, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17853

d642c60b

kill ReadOptions.prefix and .prefix_seek · 3995e801

由 Lei Jin 提交于 4月 25, 2014

Summary:
also add an override option total_order_iteration if you want to use full
iterator with prefix_extractor

Test Plan: make all check

Reviewers: igor, haobo, sdong, yhchiang

Reviewed By: haobo

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D17805

3995e801

Delete superversion and log outside of mutex · 8ce54926

由 Igor Canadi 提交于 4月 25, 2014

Summary: As summary. Add two autovectors that get filled up in MakeRoomForWrite and they get deleted outside of mutex

Test Plan: make check

Reviewers: dhruba, haobo, ljin, sdong

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18249

8ce54926

25 4月, 2014 3 次提交

Column family logging · ad3cd39c

由 Igor Canadi 提交于 4月 25, 2014

Summary:
Now that we have column families involved, we need to add extra context to every log message. They now start with "[column family name] log message"

Also added some logging that I think would be useful, like level summary after every flush (I often needed that when going through the logs).

Test Plan: make check + ran db_bench to confirm I'm happy with log output

Reviewers: dhruba, haobo, ljin, yhchiang, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18303

ad3cd39c

I

Fix corruption test · 4cd9f58c
由 Igor Canadi 提交于 4月 24, 2014

4cd9f58c

Make CompactionInputErrorParanoid less flakey · 478990c8

由 Igor Canadi 提交于 4月 24, 2014

Summary:
I'm getting lots of e-mails with CompactionInputErrorParanoid failing. Most recent example early morning today was: http://ci-builds.fb.com/job/rocksdb_valgrind/562/consoleFull

I'm putting a stop to these e-mails. I investigated why the test is flakey and it turns out it's because of non-determinsim of compaction scheduling. If there is a compaction after the last flush, CorruptFile will corrupt the compacted file instead of file at level 0 (as it assumes). That makes `Check(9, 9)` fail big time.

I also saw some errors with table file getting outputed to >= 1 levels instead of 0. Also fixed that.

Test Plan: Ran corruption_test 100 times without a failure. Previously it usually failed at 10th occurrence.

Reviewers: dhruba, haobo, ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18285

478990c8

24 4月, 2014 1 次提交

Fix a bug in IterKey · 4de5b84e

由 sdong 提交于 4月 23, 2014

Summary: IterKey set buffer_size_ to a wrong initial value, causing it to always allocate values from heap instead of stack if the key size is smaller. Fix it.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: igor, dhruba, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D18279

4de5b84e

23 4月, 2014 7 次提交

Print out stack trace in mac, too · f9f8965e

由 Igor Canadi 提交于 4月 23, 2014

Summary: While debugging Mac-only issue with ThreadLocalPtr, this was very useful. Let's print out stack trace in MAC OS, too.

Test Plan: Verified that somewhat useful stack trace was generated on mac. Will run PrintStack() on linux, too.

Reviewers: ljin, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18189

f9f8965e

Expose number of entries in mem tables to users · a5707407

由 sdong 提交于 4月 22, 2014

Summary: In this patch, two new DB properties are defined: rocksdb.num-immutable-mem-table and rocksdb.num-entries-imm-mem-tables, from where number of entries in mem tables can be exposed to users

Test Plan:
Cover the codes in db_test
make all check

Reviewers: haobo, ljin, igor

Reviewed By: igor

CC: nkg-, igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18207

a5707407

get rid of shared_ptr in memtable.cc · 5f1daf7a

由 Lei Jin 提交于 4月 22, 2014

Summary: Get rid of the devil. Probably won't impact anything on the perf side.

Test Plan: make all check

Reviewers: igor, haobo, sdong, yhchiang

Reviewed By: haobo

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D18153

5f1daf7a

PlainTableReader to expose index size to users · 86a0133d

由 sdong 提交于 4月 22, 2014

Summary:
This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files.
In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them.

Test Plan:
Add a unit test.
make all check`

Reviewers: haobo, ljin

Reviewed By: haobo

CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18195

86a0133d

I
Revert "Better port::Mutex::AssertHeld() and AssertNotHeld()" · 1068d2fa
由 Igor Canadi 提交于 4月 22, 2014
```
This reverts commit ddafceb6.
```
1068d2fa

Better port::Mutex::AssertHeld() and AssertNotHeld() · ddafceb6

由 Igor Canadi 提交于 4月 22, 2014

Summary:
Using ThreadLocalPtr as a flag to determine if a mutex is locked or not enables us to implement AssertNotHeld(). It also makes AssertHeld() actually correct.

I had to remove port::Mutex as a dependency for util/thread_local.h, but that's fine since we can just use std::mutex :)

Test Plan: make check

Reviewers: ljin, dhruba, haobo, sdong, yhchiang

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18171

ddafceb6

Support for column families in TTL DB · 3992aec8

由 Igor Canadi 提交于 4月 22, 2014

Summary:
This will enable people using TTL DB to do so with multiple column families. They can also specify different TTLs for each one.

TODO: Implement CreateColumnFamily() in TTL world.

Test Plan: Added a very simple sanity test.

Reviewers: dhruba, haobo, ljin, sdong, yhchiang

Reviewed By: haobo

CC: leveldb, alberts

Differential Revision: https://reviews.facebook.net/D17859

3992aec8

22 4月, 2014 4 次提交

I
Rename "benchmark" back to "bench". · 8dc34364
由 Igor Canadi 提交于 4月 21, 2014
```
Also, make `benchharness.cc` not compiled into rocksdb library.
```
8dc34364

Added benchmark functionality on the lines of folly/Benchmark.h · ff1b5df4

由 Pratyush Seth 提交于 4月 21, 2014

Summary: Added benchmark functionality on the lines of folly/Benchmark.h

Test Plan: Added unit tests

Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17973

ff1b5df4

I

Remove TransactionLogIteratorRace when -DNDEBUG · f813279d
由 Igor Canadi 提交于 4月 21, 2014

f813279d

hints for narrowing down FindFile range and avoiding checking unrelevant L0 files · 0f2d7681

由 Lei Jin 提交于 4月 21, 2014

Summary:
The file tree structure in Version is prebuilt and the range of each file is known.
On the Get() code path, we do binary search in FindFile() by comparing
target key with each file's largest key and also check the range for each L0 file.
With some pre-calculated knowledge, each key comparision that has been done can serve
as a hint to narrow down further searches:
(1) If a key falls within a L0 file's range, we can safely skip the next
file if its range does not overlap with the current one.
(2) If a key falls within a file's range in level L0 - Ln-1, we should only
need to binary search in the next level for files that overlap with the current one.

(1) will be able to skip some files depending one the key distribution.
(2) can greatly reduce the range of binary search, especially for bottom
levels, given that one file most likely only overlaps with N files from
the level below (where N is max_bytes_for_level_multiplier). So on level
L, we will only look at ~N files instead of N^L files.

Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this
improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s =
9.6M/s), it gives us ~13% improvement.

Test Plan: make all check

Reviewers: haobo, igor, dhruba, sdong, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17205

0f2d7681

18 4月, 2014 2 次提交

Fix bugs introduced by D17961 · 65179225

由 sdong 提交于 4月 17, 2014

Summary:
D17961 has two bugs:
(1) two level iterator fails to populate FileMetaData.table_reader, causing performance regression.
(2) table cache handle the !status.ok() case in the wrong place, causing seg fault which shouldn't happen.

Test Plan: make all check

Reviewers: ljin, igor, haobo

Reviewed By: ljin

CC: yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D17991

65179225

Minimize accessing multiple objects in Version::Get() · fa430bfd

由 sdong 提交于 4月 17, 2014

Summary:
One of our profilings shows that Version::Get() sometimes is slow when getting pointer of user comparators or other global objects. In this patch:
(1) we keep pointers of immutable objects in Version to avoid accesses them though option objects or cfd objects
(2) table_reader is directly cached in FileMetaData so that table cache don't have to go through handle first to fetch it
(3) If level 0 has less than 3 files, skip the filtering logic based on SST tables' key range. Smallest and largest key are stored in separated memory locations, which has potential cache misses

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: igor, yhchiang, nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D17739

fa430bfd

17 4月, 2014 2 次提交
- I
  
  Don't overflow size_t in mac · 161d9e58
  由 Igor Canadi 提交于 4月 16, 2014
  
  161d9e58
- I
  
  Remove tautological assert · 5c12f277
  由 Igor Canadi 提交于 4月 16, 2014
  
  5c12f277
16 4月, 2014 5 次提交
- I
  
  Close DB at the end of DontRollEmptyLogs test · faf76913
  由 Igor Canadi 提交于 4月 15, 2014
  
  faf76913
- I
  
  Fix Mac OS compile · 1803ed2c
  由 Igor Canadi 提交于 4月 15, 2014
  
  1803ed2c
- I
  
  Fix compile issues when doing make release · 7d838856
  由 Igor Canadi 提交于 4月 15, 2014
  
  7d838856
- S
  When creating a new DB, fail it when wal_dir contains existing log files · 0f40fe4b
  由 sdong 提交于 4月 15, 2014
```
Summary: Current behavior of creating new DB is, if there is existing log files, we will go ahead and replay them on top of empty DB. This is a behavior that no user would expect. With this patch, we will fail the creation if a user creates a DB with existing log files.

Test Plan: make all check

Reviewers: haobo, igor, ljin

Reviewed By: haobo

CC: nkg-, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D17817
```
  0f40fe4b
- I
  
  Fix compile issues introduced by RocksDBLite · c1666158
  由 Igor Canadi 提交于 4月 15, 2014
  
  c1666158

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功