提交 · d7ba5bce37063d2a0d2b6715157d9d0fc5fbe801 · kvdb / rocksdb

25 7月, 2013 1 次提交

Revert : If disable wal is set, then... · d7ba5bce

由 Dhruba Borthakur 提交于 7月 24, 2013

Revert 6fbe4e98: If disable wal is set, then batch commits are avoided

Summary:
Revert "If disable wal is set, then batch commits are avoided" because
keeping the mutex while inserting into the skiplist means that readers
and writes are all serialized on the mutex.

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:

d7ba5bce

24 7月, 2013 4 次提交

Virtualize SkipList Interface · 52d7ecfc

由 Jim Paton 提交于 7月 23, 2013

Summary: This diff virtualizes the skiplist interface so that users can provide their own implementation of a backing store for MemTables. Eventually, the backing store will be responsible for its own synchronization, allowing users (and us) to experiment with different lockless implementations.

Test Plan:
make clean
make -j32 check
./db_stress

Reviewers: dhruba, emayanke, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11739

52d7ecfc

If disable wal is set, then batch commits are avoided. · 6fbe4e98

由 Dhruba Borthakur 提交于 7月 23, 2013

Summary:
rocksdb uses batch commit to write to transaction log. But if
disable wal is set, then writes to transaction log are anyways
avoided. In this case, there is not much value-add to batch things,
batching can cause unnecessary delays to Puts().
This patch avoids batching when disableWal is set.

Test Plan:
make check.

I am running db_stress now.

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11763

6fbe4e98

Adding filter_deletes to crash_tests run in jenkins · f3baeecd

由 Mayank Agarwal 提交于 7月 23, 2013

Summary: filter_deletes options introduced in db_stress makes it drop Deletes on key if KeyMayExist(key) returns false on the key. code change was simple and tested so not wasting reviewer's time.

Test Plan: maek crash_test; python tools/db_crashtest[1|2].py

CC: dhruba, vamsi

Differential Revision: https://reviews.facebook.net/D11769

f3baeecd

Use KeyMayExist for WriteBatch-Deletes · bf66c10b

由 Mayank Agarwal 提交于 7月 12, 2013

Summary:
Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
Added code to skip getting Table from disk if not already present in table_cache.
Some renaming of variables.
Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
Changed KeyMayExist to not be pure virtual and provided a default implementation.
Expanded unit-tests in db_test to check appropriately.
Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.

Test Plan: db_stress;make check

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb, xjin

Differential Revision: https://reviews.facebook.net/D11745

bf66c10b

23 7月, 2013 1 次提交

[RocksDB] Fix FindMinimumEmptyLevelFitting · d364eea1

由 Haobo Xu 提交于 7月 22, 2013

Summary: as title

Test Plan: make check;

Reviewers: xjin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11751

d364eea1

20 7月, 2013 1 次提交

[RocksDB] Enable manual compaction to move files back to an appropriate level. · 9ee68871

由 Haobo Xu 提交于 6月 29, 2013

Summary: As title. This diff added an option reduce_level to CompactRange. When set to true, it will try to move the files back to the minimum level sufficient to hold the data set. Note that the default is set to true now, just to excerise it in all existing tests. Will set the default to false before check-in, for backward compatibility.

Test Plan: make check;

Reviewers: dhruba, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11553

9ee68871

13 7月, 2013 1 次提交

Fix memory leak in KeyMayExist test part of db_test · e9b675bd

由 Mayank Agarwal 提交于 7月 11, 2013

Summary: NewBloomFilterPolicy call requires Delete to be called later on

Test Plan: make; valgrind ./db_test

Reviewers: haobo, dhruba, vamsi

Differential Revision: https://reviews.facebook.net/D11667

e9b675bd

12 7月, 2013 2 次提交

Make rocksdb-deletes faster using bloom filter · 2a986919

由 Mayank Agarwal 提交于 7月 05, 2013

Summary:
Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
1. Put of delete type
2. Space in the db,and
3. Compaction time

Test Plan:
make all check;
will run db_stress and db_bench and enhance unit-test once the basic design gets approved

Reviewers: dhruba, haobo, vamsi

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11607

2a986919

Newbie code question · 8a5341ec

由 Xing Jin 提交于 7月 11, 2013

Summary:
This diff is more about my question when reading compaction codes,
instead of a normal diff. I don't quite understand the logic here.

Test Plan: I didn't do any test. If this is a bug, I will continue doing some test.

Reviewers: haobo, dhruba, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11661

8a5341ec

11 7月, 2013 1 次提交

Print complete statistics in db_stress · 821889e2

由 Mayank Agarwal 提交于 7月 10, 2013

Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster

Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1

Reviewers: sheki, dhruba, vamsi, haobo

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D11655

821889e2

10 7月, 2013 1 次提交

[RocksDB] Remove old readahead options · a8d5f8dd

由 Haobo Xu 提交于 7月 09, 2013

Summary: As title.

Test Plan: make check; db_bench

Reviewers: dhruba, MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11643

a8d5f8dd

09 7月, 2013 1 次提交

[RocksDB] Provide contiguous sequence number even in case of write failure · 9ba82786

由 Haobo Xu 提交于 6月 20, 2013

Summary: Replication logic would be simplifeid if we can guarantee that write sequence number is always contiguous, even if write failure occurs. Dhruba and I looked at the sequence number generation part of the code. It seems fixable. Note that if WAL was successful and insert into memtable was not, we would be in an unfortunate state. The approach in this diff is : IO error is expected and error status will be returned to client, sequence number will not be advanced; In-mem error is not expected and we panic.

Test Plan: make check; db_stress

Reviewers: dhruba, sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11439

9ba82786

04 7月, 2013 1 次提交

[RocksDB] Support internal key/value dump for ldb · 92ca816a

由 Haobo Xu 提交于 6月 20, 2013

Summary: This diff added a command 'idump' to ldb tool, which dumps the internal key/value pairs. It could be useful for diagnosis and estimating the per user key 'overhead'. Also cleaned up the ldb code a bit where I touched.

Test Plan: make check; ldb idump

Reviewers: emayanke, sheki, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11517

92ca816a

02 7月, 2013 1 次提交

Update rocksdb version · d56523c4

由 Mayank Agarwal 提交于 6月 29, 2013

Summary: rocksdb-2.0 released to third party

Test Plan: visual inspection

Reviewers: dhruba, haobo, sheki

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11559

d56523c4

27 6月, 2013 2 次提交

[RocksDB] Expose count for WriteBatch · 71e0f695

由 Haobo Xu 提交于 6月 26, 2013

Summary: As title. Exposed a Count function that returns the number of updates in a batch. Could be handy for replication sequence number check.

Test Plan: make check;

Reviewers: emayanke, sheki, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11523

71e0f695

Added stringappend_test back into the unit tests. · 34ef8732

由 Deon Nicholas 提交于 6月 26, 2013

Summary:
With the Makefile now updated to correctly update all .o files, this
should fix the issues recompiling stringappend_test. This should also fix the
"segmentation-fault" that we were getting earlier. Now, stringappend_test should
be clean, and I have added it back to the unit-tests. Also made some minor updates
to the tests themselves.

Test Plan:
1. make clean; make stringappend_test -j 32	(will test it by itself)
2. make clean; make all check -j 32		(to run all unit tests)
3. make clean; make release			(test in release mode)
4. valgrind ./stringappend_test 		(valgrind tests)

Reviewers: haobo, jpaton, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11505

34ef8732

26 6月, 2013 1 次提交

Updated "make clean" to remove all .o files · 6894a50a

由 Deon Nicholas 提交于 6月 25, 2013

Summary:
The old Makefile did not remove ALL .o and .d files, but rather only
those that happened to be in the root folder and one-level deep. This was causing
issues when recompiling files in deeper folders. This fix now causes make clean
to find ALL .o and .d files via a unix "find" command, and then remove them.

Test Plan:
make clean;
make all -j 32;

Reviewers: haobo, jpaton, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11493

6894a50a

22 6月, 2013 1 次提交

Simplify bucketing logic in ldb-ttl · b858da70

由 Mayank Agarwal 提交于 6月 20, 2013

Summary: [start_time, end_time) is waht I'm following for the buckets and the whole time-range. Also cleaned up some code in db_ttl.* Not correcting the spacing/indenting convention for util/ldb_cmd.cc in this diff.

Test Plan: python ldb_test.py, make ttl_test, Run mcrocksdb-backup tool, Run the ldb tool on 2 mcrocksdb production backups form sigmafio033.prn1

Reviewers: vamsi, haobo

Reviewed By: vamsi

Differential Revision: https://reviews.facebook.net/D11433

b858da70

20 6月, 2013 4 次提交

Introducing timeranged scan, timeranged dump in ldb. Also the ability to count... · 61f1baae

由 Mayank Agarwal 提交于 6月 18, 2013

Introducing timeranged scan, timeranged dump in ldb. Also the ability to count in time-batches during Dump

Summary:
Scan and Dump commands in ldb use iterator. We need to also print timestamp for ttl databases for debugging. For this I create a TtlIterator class pointer in these functions and assign it the value of Iterator pointer which actually points to t TtlIterator object, and access the new function ValueWithTS which can return TS also. Buckets feature for dump command: gives a count of different key-values in the specified time-range distributed across the time-range partitioned according to bucket-size. start_time and end_time are specified in unixtimestamp and bucket in seconds on the user-commandline
Have commented out 3 ines from ldb_test.py so that the test does not break right now. It breaks because timestamp is also printed now and I have to look at wildcards in python to compare properly.

Test Plan: python tools/ldb_test.py

Reviewers: vamsi, dhruba, haobo, sheki

Reviewed By: vamsi

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11403

61f1baae

[RocksDB] add back --mmap_read options to crashtest · 0f78fad9

由 Haobo Xu 提交于 6月 19, 2013

Summary: As title, now that db_stress supports --map_read properly

Test Plan: make crash_test

Reviewers: vamsi, emayanke, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11391

0f78fad9

[RocksDB] Minor change to statistics.h · 4deaa0d4

由 Haobo Xu 提交于 6月 19, 2013

Summary: as title, use initialize list so that lines fit in 80 chars.

Test Plan: make check;

Reviewers: sheki, dhruba

Differential Revision: https://reviews.facebook.net/D11385

4deaa0d4

[RocksDB] Add mmap_read option for db_stress · 96be2c4e

由 Haobo Xu 提交于 6月 17, 2013

Summary: as title, also removed an incorrect assertion

Test Plan: make check; db_stress --mmap_read=1; db_stress --mmap_read=0

Reviewers: dhruba, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11367

96be2c4e

19 6月, 2013 5 次提交

[rocksdb][refactor] statistic printing code to one place · 5ef6bb8c

由 Abhishek Kona 提交于 6月 18, 2013

Summary: $title

Test Plan: db_bench --statistics=1

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11373

5ef6bb8c

Fix Zlib_Compress and Zlib_Uncompress · 09de7a3b

由 Jim Paton 提交于 6月 18, 2013

Summary:
Zlib_{Compress,Uncompress} did not handle very small input buffers properly. In addition, they did not call inflate/deflate until Z_STREAM_END was returned; it was possible for them to exit when only Z_OK had returned.

This diff also fixes a bunch of lint errors.

Test Plan: Run make check

Reviewers: dhruba, sheki, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11301

09de7a3b

[RocksDB] Option for incremental sync · 3cc1af20

由 Haobo Xu 提交于 6月 13, 2013

Summary: This diff added an option to control the incremenal sync frequency. db_bench has a new flag bytes_per_sync for easy tuning exercise.

Test Plan: make check; db_bench

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11295

3cc1af20

[Rocksdb] Simplify Printing code in db_bench · 79f4fd2b

由 Abhishek Kona 提交于 6月 18, 2013

Summary:
simplify the printing code in db_bench
         use TickersMap and HistogramsNameMap introduced in previous diffs.

Test Plan: ./db_bench --statistics=1 and see if all the statistics are printed

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11355

79f4fd2b

Compact multiple memtables before flushing to storage. · 6acbe0fc

由 Dhruba Borthakur 提交于 6月 11, 2013

Summary:
Merge multiple multiple memtables in memory before writing it
out to a file in L0.

There is a new config parameter min_write_buffer_number_to_merge
that specifies the number of write buffers that should be merged
together to a single file in storage. The system will not flush
wrte buffers to storage unless at least these many buffers have
accumulated in memory.
The default value of this new parameter is 1, which means that
a write buffer will be immediately flushed to disk as soon it is
ready.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D11241

6acbe0fc

18 6月, 2013 4 次提交

A

[Rocksdb] Rename one stat key from leveldb to rocksdb · f561b3a3
由 Abhishek Kona 提交于 6月 17, 2013

f561b3a3

Enhance dbstress to allow specifying compaction trigger for L0. · 836534de

由 Dhruba Borthakur 提交于 6月 17, 2013

Summary:
Rocksdb allos specifying the number of files in L0 that triggers
compactions. Expose this api as a command line parameter for
running db_stress.

Test Plan: Run test

Reviewers: sheki, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11343

836534de

[rocksdb] do not trim range for level0 in manual compaction · 00124683

由 Abhishek Kona 提交于 6月 17, 2013

Summary:
https://code.google.com/p/leveldb/issues/detail?can=1&q=178&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary&id=178

Ported the solution as is to RocksDB.

Test Plan: moved the unit test as manual_compaction_test

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11331

00124683

[Rocksdb] Record WriteBlock Times into a histogram · 39ee47fb

由 Abhishek Kona 提交于 6月 17, 2013

Summary: Add a histogram to track WriteBlock times

Test Plan: db_bench and print

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11319

39ee47fb

15 6月, 2013 4 次提交

Minor tweaks to StringAppend MergeOperator. · 8926b727

由 Deon Nicholas 提交于 6月 14, 2013

Summary:
I'm concerned about a random seg-fault that sometimes occurs when
running stringappend_test. I will investigate further. First, I am removing
stringappend_test from the regular release tests, and making some clean-ups
to the code.

Test Plan:
1. make stringappend_test
2. ./stringappend_test

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11313

8926b727

[Rocksdb] Implement filluniquerandom · bff718d8

由 Abhishek Kona 提交于 6月 14, 2013

Summary:
Use a bit set to keep track of which random number is generated.
        Currently only supports single-threaded. All our perf tests are run with threads=1
        Copied over bitset implementation from common/datastructures

Test Plan: printed the generated keys, and verified all keys were present.

Reviewers: MarkCallaghan, haobo, dhruba

Reviewed By: MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11247

bff718d8

Fix db_bench for release build. · 2a52e1dc

由 Deon Nicholas 提交于 6月 14, 2013

Test Plan: make release

Reviewers: haobo, dhruba, jpaton

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11307

2a52e1dc

[RocksDB] Compaction Filter Cleanup · 1afdf287

由 Haobo Xu 提交于 6月 06, 2013

Summary: This hopefully gives the right semantics to compaction filter. Will write a small wiki to explain the ideas.

Test Plan: make check; db_stress

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11121

1afdf287

14 6月, 2013 2 次提交

[Rocksdb] measure table open io in a histogram · 7a5f71d1

由 Abhishek Kona 提交于 6月 13, 2013

Summary: Table is setup for compaction using Table::SetupForCompaction. So read block calls can be differentiated b/w Gets/Compaction. Use this and measure times.

Test Plan: db_bench --statistics=1

Reviewers: dhruba, haobo

Reviewed By: haobo

CC: leveldb, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D11217

7a5f71d1

[RocksDB] Fix build. Removed deprecated option --mmap_read from db_crashtest · 0c2a2dd5

由 Haobo Xu 提交于 6月 13, 2013

Summary: As title

Test Plan: db_crashtest

Reviewers: vamsi, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11271

0c2a2dd5

13 6月, 2013 2 次提交

[RocksDB] Sync file to disk incrementally · 778e1790

由 Haobo Xu 提交于 6月 04, 2013

Summary:
During compaction, we sync the output files after they are fully written out. This causes unnecessary blocking of the compaction thread and burstiness of the write traffic.
This diff simply asks the OS to sync data incrementally as they are written, on the background. The hope is that, at the final sync, most of the data are already on disk and we would block less on the sync call. Thus, each compaction runs faster and we could use fewer number of compaction threads to saturate IO.
In addition, the write traffic will be smoothed out, hopefully reducing the IO P99 latency too.

Some quick tests show 10~20% improvement in per thread compaction throughput. Combined with posix advice on compaction read, just 5 threads are enough to almost saturate the udb flash bandwidth for 800 bytes write only benchmark.
What's more promising is that, with saturated IO, iostat shows average wait time is actually smoother and much smaller.
For the write only test 800bytes test:
Before the change: await occillate between 10ms and 3ms
After the change: await ranges 1-3ms

Will test against read-modify-write workload too, see if high read latency P99 could be resolved.

Will introduce a parameter to control the sync interval in a follow up diff after cleaning up EnvOptions.

Test Plan: make check; db_bench; db_stress

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11115

778e1790

[Rocksdb] [Multiget] Introduced multiget into db_bench · 4985a9f7

由 Deon Nicholas 提交于 6月 12, 2013

Summary:
Preliminary! Introduced the --use_multiget=1 and --keys_per_multiget=n
flags for db_bench. Also updated and tested the ReadRandom() method
to include an option to use multiget. By default,
keys_per_multiget=100.

Preliminary tests imply that multiget is at least 1.25x faster per
key than regular get.

Will continue adding Multiget for ReadMissing, ReadHot,
RandomWithVerify, ReadRandomWriteRandom; soon. Will also think
about ways to better verify benchmarks.

Test Plan:
1. make db_bench
2. ./db_bench --benchmarks=fillrandom
3. ./db_bench --benchmarks=readrandom --use_existing_db=1
	      --use_multiget=1 --threads=4 --keys_per_multiget=100
4. ./db_bench --benchmarks=readrandom --use_existing_db=1
	      --threads=4
5. Verify ops/sec (and 1000000 of 1000000 keys found)

Reviewers: haobo, MarkCallaghan, dhruba

Reviewed By: MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11127

4985a9f7

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功