提交 · c61c9830d4b9e4eed66bd8e013788f861ab88d2e · kvdb / rocksdb

13 3月, 2014 4 次提交

I
Revert "DB stress with normal skip list" · 04a1035e
由 Igor Canadi 提交于 3月 12, 2014
```
This reverts commit 86926d8c.
```
04a1035e

由 Lei Jin 提交于 3月 12, 2014

Summary:
this should fix the hash_skip_list issue, but I still see seqno
assertion failure in the last run. Will continue investigating and
address that in a different diff

Test Plan: make whitebox_crash_test

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16851

02a2cb13

DB stress with normal skip list · 86926d8c

由 Igor Canadi 提交于 3月 12, 2014

Summary:
Hash skip list has issues, causing db_stress to fail badly.

For now, switching back to skip_list by default before we figure out root cause.

Test Plan: db_stress is happy(ier)

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16845

86926d8c

DBStress cleanup · 5ba028c1

由 Igor Canadi 提交于 3月 12, 2014

Summary:
*) fixed the comment
*) constant 1 was not casted to 64-bit, which (I think) might cause overflow if we shift it too much
*) default prefix size to be 7, like it was before

Test Plan: compiled

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16827

5ba028c1

12 3月, 2014 2 次提交

make assert based on FLAGS_prefix_size · 86ba3e24

由 Lei Jin 提交于 3月 11, 2014

Summary: as title

Test Plan: running python tools/db_crashtest.py

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16803

86ba3e24

fix db_stress test · 02dab3be

由 Lei Jin 提交于 3月 11, 2014

Summary: Fix the db_stress test, let is run with HashSkipList for real

Test Plan:
python tools/db_crashtest.py
python tools/db_crashtest2.py

Reviewers: igor, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16773

02dab3be

11 3月, 2014 1 次提交

Consolidate SliceTransform object ownership · 8d007b4a

由 Lei Jin 提交于 3月 10, 2014

Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.

Test Plan: db_bench && make asan_check

Reviewers: haobo, igor, sdong

Reviewed By: igor

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D16587

8d007b4a

09 2月, 2014 1 次提交
- A
  
  Support for LZ4 compression. · df2f9221
  由 Albert Strasheim 提交于 2月 07, 2014
  
  df2f9221
25 1月, 2014 3 次提交

Moving Some includes from options.h to forward declaration · 8477255d

由 Siying Dong 提交于 1月 24, 2014

Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.

Test Plan: make all check

Reviewers: kailiu, haobo, igor, dhruba

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15411

8477255d

Revert "Moving to glibc-fb" · e832e72b

由 Igor Canadi 提交于 1月 24, 2014

This reverts commit d24961b6.

For some reason, glibc2.17-fb breaks gflags. Reverting for now

e832e72b

Moving to glibc-fb · d24961b6

由 Igor Canadi 提交于 1月 24, 2014

Summary:
It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.

This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.

Test Plan:
Compiled
Ran ./db_bench ./db_stress ./db_repl_stress

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15405

d24961b6

18 1月, 2014 1 次提交

Statistics code cleanup · 83681bf9

由 Igor Canadi 提交于 1月 17, 2014

Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review.

Test Plan: make check

Reviewers: haobo, kailiu, sdong, dhruba, tnovak

Reviewed By: tnovak

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15099

83681bf9

04 12月, 2013 1 次提交

Killing Transform Rep · eb12e47e

由 Igor Canadi 提交于 12月 03, 2013

Summary:
Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around.

This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release.

I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions.

Test Plan: make check

Reviewers: dhruba, haobo, kailiu, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14397

eb12e47e

20 11月, 2013 1 次提交

Fix two nasty use-after-free-bugs · 469a9f32

由 Igor Canadi 提交于 11月 19, 2013

Summary:
These bugs were caught by ASAN crash test.
1. The first one, in table/filter_block.cc is very nasty. We first reference entries_ and store the reference to Slice prev. Then, we call entries_.append(), which can change the reference. The Slice prev now points to junk.
2. The second one is a bug in a test, so it's not very serious. Once we set read_opts.prefix, we never clear it, so some other function might still reference it.

Test Plan: asan crash test now runs more than 5 mins. Before, it failed immediately. I will run the full one, but the full one takes quite some time (5 hours)

Reviewers: dhruba, haobo, kailiu

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14223

469a9f32

17 11月, 2013 1 次提交

make util/env_posix.cc work under mac · 97d8e573

由 kailiu 提交于 11月 16, 2013

Summary: This diff invoves some more complicated issues in the posix environment.

Test Plan: works under mac os. will need to verify dev box.

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14061

97d8e573

13 11月, 2013 1 次提交

Add the index/filter block cache · 88ba331c

由 Kai Liu 提交于 11月 12, 2013

Summary: This diff leverage the existing block cache and extend it to cache index/filter block.

Test Plan:
Added new tests in db_test and table_test

The correctness is checked by:

1. make check
2. make valgrind_check

Performance is test by:

1. 10 times of build_tools/regression_build_test.sh on two versions of rocksdb before/after the code change. Test results suggests no significant difference between them. For the two key operatons `overwrite` and `readrandom`, the average iops are both 20k and ~260k, with very small variance).
2. db_stress.

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb, haobo, xjin

Differential Revision: https://reviews.facebook.net/D13167

88ba331c

02 11月, 2013 1 次提交

Implement a compressed block cache. · b4ad5e89

由 Dhruba Borthakur 提交于 9月 01, 2013

Summary:
Rocksdb can now support a uncompressed block cache, or a compressed
block cache or both. Lookups first look for a block in the
uncompressed cache, if it is not found only then it is looked up
in the compressed cache. If it is found in the compressed cache,
then it is uncompressed and inserted into the uncompressed cache.

It is possible that the same block resides in the compressed cache
as well as the uncompressed cache at the same time. Both caches
have their own individual LRU policy.

Test Plan: Unit test case attached.

Reviewers: kailiu, sdong, haobo, leveldb

Reviewed By: haobo

CC: xjin, haobo

Differential Revision: https://reviews.facebook.net/D12675

b4ad5e89

24 10月, 2013 1 次提交

Conversion of db_bench, db_stress and db_repl_stress to use gflags · e44976b1

由 Slobodan Predolac 提交于 10月 24, 2013

Summary: Converted db_stress, db_repl_stress and db_bench to use gflags

Test Plan: I tested by printing out all the flags from old and new versions. Tried defaults, + various combinations with "interesting flags". Also, tested by running db_crashtest.py and db_crashtest2.py.

Reviewers: emayanke, dhruba, haobo, kailiu, sdong

Reviewed By: emayanke

CC: leveldb, xjin

Differential Revision: https://reviews.facebook.net/D13581

e44976b1

17 10月, 2013 1 次提交

Add appropriate LICENSE and Copyright message. · 9cd22109

由 Dhruba Borthakur 提交于 10月 16, 2013

Summary:
Add appropriate LICENSE and Copyright message.

Test Plan:
make check

Reviewers:

CC:

Task ID: #

Blame Rev:

9cd22109

06 10月, 2013 1 次提交

Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. · 4463b11c

由 Dhruba Borthakur 提交于 10月 04, 2013

Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix.

Test Plan: make check

Reviewers: emayanke, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13311

4463b11c

05 10月, 2013 1 次提交

Change namespace from leveldb to rocksdb · a143ef9b

由 Dhruba Borthakur 提交于 10月 03, 2013

Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.

Test Plan: compile rocksdb

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13287

a143ef9b

03 10月, 2013 1 次提交

Triggering verify for gets also · 6b34021f

由 Mayank Agarwal 提交于 10月 01, 2013

Summary: Will use iterators to verify keys in the db for half of its keys and Gets for the other half.

Test Plan: ./db_stress --max_key=1000 --ops_per_thread=100

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13227

6b34021f

01 10月, 2013 1 次提交

Phase 2 of iterator stress test · 7edb92b8

由 Natalie Hildebrandt 提交于 9月 30, 2013

Summary: Using an iterator instead of the Get method, each thread goes through a portion of the database and verifies values by comparing to the shared state.

Test Plan:
./db_stress --db=/tmp/tmppp --max_key=10000 --ops_per_thread=10000

To test some basic cases, the following lines can be added (each set in turn) to the verifyDb method with the following expected results:

    // Should abort with "Unexpected value found"
    shared.Delete(start);

    // Should abort with "Value not found"
    WriteOptions write_opts;
    db_->Delete(write_opts, Key(start));

    // Should succeed
    WriteOptions write_opts;
    shared.Delete(start);
     db_->Delete(write_opts, Key(start));

    // Should abort with "Value not found"
    WriteOptions write_opts;
    db_->Delete(write_opts, Key(start + (end-start)/2));

    // Should abort with "Value not found"
    db_->Delete(write_opts, Key(end-1));

    // Should abort with "Unexpected value"
    shared.Delete(end-1);

    // Should abort with "Unexpected value"
    shared.Delete(start + (end-start)/2);

    // Should abort with "Value not found"
    db_->Delete(write_opts, Key(start));
    shared.Delete(start);
    db_->Delete(write_opts, Key(end-1));
    db_->Delete(write_opts, Key(end-2));

To test the out of range abort, change the key in the for loop to Key(i+1), so that the key defined by the index i is now outside of the supposed range of the database.

Reviewers: emayanke

Reviewed By: emayanke

CC: dhruba, xjin

Differential Revision: https://reviews.facebook.net/D13071

7edb92b8

20 9月, 2013 1 次提交

Phase 1 of an iterator stress test · 43354182

由 Natalie Hildebrandt 提交于 9月 19, 2013

Summary:
Added MultiIterate() which does a seek and some Next/Prev
calls.  Iterator status is checked only, no data integrity check

Test Plan:
make db_stress
./db_stress --iterpercent=<nonzero value> --readpercent=, etc.

Reviewers: emayanke, dhruba, xjin

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12915

43354182

14 9月, 2013 1 次提交

Added a parameter to limit the maximum space amplification for universal compaction. · 4012ca1c

由 Dhruba Borthakur 提交于 9月 09, 2013

Summary:
Added a new field called max_size_amplification_ratio in the
CompactionOptionsUniversal structure. This determines the maximum
percentage overhead of space amplification.

The size amplification is defined to be the ratio between the size of
the oldest file to the sum of the sizes of all other files. If the
size amplification exceeds the specified value, then min_merge_width
and max_merge_width are ignored and a full compaction of all files is done.
A value of 10 means that the size a database that stores 100 bytes
of user data could occupy 110 bytes of physical storage.

Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.

Reviewers: haobo, emayanke, xjin

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12825

4012ca1c

24 8月, 2013 1 次提交

Replace include/leveldb with include/rocksdb. · 1186192e

由 Dhruba Borthakur 提交于 8月 23, 2013

Summary: Replace include/leveldb with include/rocksdb.

Test Plan:
make clean; make check
make clean; make release

Differential Revision: https://reviews.facebook.net/D12489

1186192e

23 8月, 2013 1 次提交

Add three new MemTableRep's · 74781a0c

由 Jim Paton 提交于 8月 22, 2013

Summary:
This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.

UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.

VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.

PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.

I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).

Test Plan:
make -j32 check
./db_stress --memtablerep=vector
./db_stress --memtablerep=unsorted
./db_stress --memtablerep=prefixhash --prefix_size=10

Reviewers: dhruba, haobo, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12117

74781a0c

21 8月, 2013 1 次提交

Made merge_oprator a shared_ptr; and added TTL unit tests · b87dcae1

由 Deon Nicholas 提交于 8月 20, 2013

Test Plan:
- make all check;
- make release;
- make stringappend_test; ./stringappend_test

Reviewers: haobo, emayanke

Reviewed By: haobo

CC: leveldb, kailiu

Differential Revision: https://reviews.facebook.net/D12381

b87dcae1

16 8月, 2013 1 次提交

Benchmarking for Merge Operator · ad48c3c2

由 Deon Nicholas 提交于 8月 15, 2013

Summary:
Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
so that the tester can easily benchmark different merge operators. Currently supports
the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.

Test Plan:
	1. make db_bench
	2. Test the PutOperator (simulating Put) as follows:
./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
--threads=2

3. Test the UInt64AddOperator (simulating numeric addition) similarly:
./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
--merge_operator=uint64add --threads=2

Reviewers: haobo, dhruba, zshao, MarkCallaghan

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11535

ad48c3c2

15 8月, 2013 2 次提交

Minor fix to current codes · 0a5afd1a

由 Xing Jin 提交于 8月 13, 2013

Summary:
Minor fix to current codes, including: coding style, output format,
comments. No major logic change. There are only 2 real changes, please see my inline comments.

Test Plan: make all check

Reviewers: haobo, dhruba, emayanke

Differential Revision: https://reviews.facebook.net/D12297

0a5afd1a

Add prefix scans to db_stress (and bug fix in prefix scan) · 7612d496

由 Tyler Harter 提交于 8月 14, 2013

Summary: Added support for prefix scans.

Test Plan: ./db_stress --max_key=4096 --ops_per_thread=10000

Reviewers: dhruba, vamsi

Reviewed By: vamsi

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12267

7612d496

06 8月, 2013 1 次提交

Expose base db object from ttl wrapper · 1d7b4765

由 Mayank Agarwal 提交于 8月 05, 2013

Summary: rocksdb replicaiton will need this when writing value+TS from master to slave 'as is'

Test Plan: make

Reviewers: dhruba, vamsi, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11919

1d7b4765

02 8月, 2013 1 次提交

Expand KeyMayExist to return the proper value if it can be found in memory and... · 59d0b02f

由 Mayank Agarwal 提交于 7月 26, 2013

Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache

Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test

Test Plan: make all check;db_stress for 1 hour

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11853

59d0b02f

24 7月, 2013 1 次提交

Use KeyMayExist for WriteBatch-Deletes · bf66c10b

由 Mayank Agarwal 提交于 7月 12, 2013

Summary:
Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
Added code to skip getting Table from disk if not already present in table_cache.
Some renaming of variables.
Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
Changed KeyMayExist to not be pure virtual and provided a default implementation.
Expanded unit-tests in db_test to check appropriately.
Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.

Test Plan: db_stress;make check

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb, xjin

Differential Revision: https://reviews.facebook.net/D11745

bf66c10b

12 7月, 2013 1 次提交

Make rocksdb-deletes faster using bloom filter · 2a986919

由 Mayank Agarwal 提交于 7月 05, 2013

Summary:
Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
1. Put of delete type
2. Space in the db,and
3. Compaction time

Test Plan:
make all check;
will run db_stress and db_bench and enhance unit-test once the basic design gets approved

Reviewers: dhruba, haobo, vamsi

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11607

2a986919

11 7月, 2013 1 次提交

Print complete statistics in db_stress · 821889e2

由 Mayank Agarwal 提交于 7月 10, 2013

Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster

Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1

Reviewers: sheki, dhruba, vamsi, haobo

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D11655

821889e2

04 7月, 2013 1 次提交

Renamed 'hybrid_compaction' tp be "Universal Compaction'. · 116ec527

由 Dhruba Borthakur 提交于 7月 03, 2013

Summary:
All the universal compaction parameters are encapsulated in
a new file universal_compaction.h

Test Plan:
make check

116ec527

01 7月, 2013 2 次提交

Reduce write amplification by merging files in L0 back into L0 · 47c4191f

由 Dhruba Borthakur 提交于 6月 13, 2013

Summary:
There is a new option called hybrid_mode which, when switched on,
causes HBase style compactions.  Files from L0 are
compacted back into L0. This meat of this compaction algorithm
is in PickCompactionHybrid().

All files reside in L0. That means all files have overlapping
keys. Each file has a time-bound, i.e. each file contains a
range of keys that were inserted around the same time. The
start-seqno and the end-seqno refers to the timeframe when
these keys were inserted.  Files that have contiguous seqno
are compacted together into a larger file. All files are
ordered from most recent to the oldest.

The current compaction algorithm starts to look for
candidate files starting from the most recent file. It continues to
add more files to the same compaction run as long as the
sum of the files chosen till now is smaller than the next
candidate file size. This logic needs to be debated
and validated.

The above logic should reduce write amplification to a
large extent... will publish numbers shortly.

Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).

Differential Revision: https://reviews.facebook.net/D11289

47c4191f

Reduce write amplification by merging files in L0 back into L0 · 554c06dd

由 Dhruba Borthakur 提交于 6月 13, 2013

Summary:
There is a new option called hybrid_mode which, when switched on,
causes HBase style compactions.  Files from L0 are
compacted back into L0. This meat of this compaction algorithm
is in PickCompactionHybrid().

All files reside in L0. That means all files have overlapping
keys. Each file has a time-bound, i.e. each file contains a
range of keys that were inserted around the same time. The
start-seqno and the end-seqno refers to the timeframe when
these keys were inserted.  Files that have contiguous seqno
are compacted together into a larger file. All files are
ordered from most recent to the oldest.

The current compaction algorithm starts to look for
candidate files starting from the most recent file. It continues to
add more files to the same compaction run as long as the
sum of the files chosen till now is smaller than the next
candidate file size. This logic needs to be debated
and validated.

The above logic should reduce write amplification to a
large extent... will publish numbers shortly.

Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).

Differential Revision: https://reviews.facebook.net/D11289

554c06dd

20 6月, 2013 1 次提交

[RocksDB] Add mmap_read option for db_stress · 96be2c4e

由 Haobo Xu 提交于 6月 17, 2013

Summary: as title, also removed an incorrect assertion

Test Plan: make check; db_stress --mmap_read=1; db_stress --mmap_read=0

Reviewers: dhruba, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11367

96be2c4e

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功