提交 · 726c8084cdf994793375d18fd3aee053dc798e51 · kvdb / rocksdb

01 4月, 2014 1 次提交

Retry FS system calls on EINTR · 726c8084

由 Igor Canadi 提交于 3月 31, 2014

Summary: EINTR means 'please retry'. We don't do that currenty. We should.

Test Plan: make check, although it doesn't really test the new code. we'll just have to believe in the code!

Reviewers: haobo, ljin

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17349

726c8084

29 3月, 2014 5 次提交

dynamicbloom fix: don't offset address when it is already aligned · 550cca71

由 Lei Jin 提交于 3月 28, 2014

Summary: this causes overflow and asan failure

Test Plan: make asan_check

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17301

550cca71

Change default value of some Options · 43a593a6

由 sdong 提交于 3月 28, 2014

Summary: Since we are optimizing for server workloads, some default values are not optimized any more. We change some of those values that I feel it's less prone to regression bugs.

Test Plan: make all check

Reviewers: dhruba, haobo, ljin, igor, yhchiang

Reviewed By: igor

CC: leveldb, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D16995

43a593a6

fix the buffer overflow in dynamic_bloom_test · c8bb7997

由 Lei Jin 提交于 3月 28, 2014

Summary: int -> uint64_t

Test Plan:
it think it is pretty obvious
will run asan_check before committing

Reviewers: igor, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17241

c8bb7997

I

Don't preallocate log files · 64ae6e9e
由 Igor Canadi 提交于 3月 28, 2014

64ae6e9e

cache friendly blocked bloomfilter · 0d755fff

由 Lei Jin 提交于 3月 28, 2014

Summary:
By constraining the probes within cache line(s), we can improve the
cache miss rate thus performance. This probably only makes sense for
in-memory workload so defaults the option to off.

Numbers and comparision can be found in wiki:
https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study

Test Plan: benchmarked this change substantially. Will run make all check as well

Reviewers: haobo, igor, dhruba, sdong, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17133

0d755fff

28 3月, 2014 2 次提交
- I
  
  allow mmap writes · b14c1f99
  由 Igor Canadi 提交于 3月 27, 2014
  
  b14c1f99
- I
  
  Make rate limiting unit test more robust · 5826f952
  由 Igor Canadi 提交于 3月 27, 2014
  
  5826f952
27 3月, 2014 2 次提交

Fix valgrind issues · 1c9f8f08

由 Igor Canadi 提交于 3月 27, 2014

Summary:
NewFixedPrefixTransform is leaked in default options. Broken by https://github.com/facebook/rocksdb/commit/b47812fba601e23872349407d565d15f0b41a2fe

Also included in the diff some code cleanup

Test Plan:
valgrind env_test
also make check

Reviewers: haobo, danguo, yhchiang

Reviewed By: danguo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17211

1c9f8f08

Some small cleaning up to make some compiling environment happy · d5562002

由 sdong 提交于 3月 26, 2014

Summary: Compiler complains some errors when building using our internal build settings. Fix them.

Test Plan: rebuild

Reviewers: haobo, dhruba, igor, yhchiang, ljin

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17199

d5562002

26 3月, 2014 1 次提交
- I
  
  fallocate_with_keep_size is false for LogWrites · 5c44a8db
  由 Igor Canadi 提交于 3月 25, 2014
  
  5c44a8db
25 3月, 2014 2 次提交

[rocksdb] new CompactionFilterV2 API · b47812fb

由 Danny Guo 提交于 1月 09, 2014

Summary:
This diff adds a new CompactionFilterV2 API that roll up the
decisions of kv pairs during compactions. These kv pairs must share the
same key prefix. They are buffered inside the db.

    typedef std::vector<Slice> SliceVector;
    virtual std::vector<bool> Filter(int level,
                                 const SliceVector& keys,
                                 const SliceVector& existing_values,
                                 std::vector<std::string>* new_values,
                                 std::vector<bool>* values_changed
                                 ) const = 0;

Application can override the Filter() function to operate
on the buffered kv pairs. More details in the inline documentation.

Test Plan:
make check. Added unit tests to make sure Keep, Delete,
Change all works.

Reviewers: haobo

CCs: leveldb

Differential Revision: https://reviews.facebook.net/D15087

b47812fb

Enhance partial merge to support multiple arguments · cda4006e

由 Yueh-Hsuan Chiang 提交于 3月 24, 2014

Summary:
* PartialMerge api now takes a list of operands instead of two operands.
* Add min_pertial_merge_operands to Options, indicating the minimum
  number of operands to trigger partial merge.
* This diff is based on Schalk's previous diff (D14601), but it also
  includes necessary changes such as updating the pure C api for
  partial merge.

Test Plan:
* make check all
* develop tests for cases where partial merge takes more than two
  operands.

TODOs (from Schalk):
* Add test with min_partial_merge_operands > 2.
* Perform benchmarks to measure the performance improvements (can probably
  use results of task #2837810.)
* Add description of problem to doc/index.html.
* Change wiki pages to reflect the interface changes.

Reviewers: haobo, igor, vamsi

Reviewed By: haobo

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D16815

cda4006e

22 3月, 2014 1 次提交

Fix data corruption by LogBuffer · 83ab62e2

由 sdong 提交于 3月 21, 2014

Summary: LogBuffer::AddLogToBuffer() uses vsnprintf() in the wrong way, which might cause buffer overflow when log line is too line. Fix it.

Test Plan: Add a unit test to cover most LogBuffer's most logic.

Reviewers: igor, haobo, dhruba

Reviewed By: igor

CC: ljin, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D17103

83ab62e2

21 3月, 2014 2 次提交

Sanity check on Open · e67241f0

由 Igor Canadi 提交于 3月 20, 2014

Summary:
Everytime a client opens a DB, we do a sanity check that:
* checks the existance of all the necessary files
* verifies that file sizes are correct

Some of the code was stolen from https://reviews.facebook.net/D16935

Test Plan: added a unit test

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17097

e67241f0

Consistency Check Function · 7981a432

由 Yiting Li 提交于 3月 20, 2014

Summary: Added a function/command to check the consistency of live files' meta data

Test Plan:
Manual test (size mismatch, file not exist).
Command test script.

Reviewers: haobo

Reviewed By: haobo

CC: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D16935

7981a432

20 3月, 2014 1 次提交

Fix compile issue in Mac OS · 22507aff

由 Igor Canadi 提交于 3月 19, 2014

Summary:
Compile issues are:
* Unused variable env_
* Unused fallocate_with_keep_size_

Test Plan: compiles

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17043

22507aff

18 3月, 2014 1 次提交

Optimize fallocation · f26cb0f0

由 Igor Canadi 提交于 3月 17, 2014

Summary:
Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads.

This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x.

At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported.

Test Plan: make check

Reviewers: dhruba, sdong, haobo, ljin

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16761

f26cb0f0

15 3月, 2014 5 次提交

C

Breaking line · f234dfd8
由 Caio SBA 提交于 3月 14, 2014

f234dfd8
C

Make it compile on Debian/GCC 4.7 · b9c78d2d
由 Caio SBA 提交于 3月 14, 2014

b9c78d2d
I

unterminated conditional directive · 56dce9bf
由 Igor Canadi 提交于 3月 14, 2014

56dce9bf
I

Fix another Mac OS warning · f74659ac
由 Igor Canadi 提交于 3月 14, 2014

f74659ac

Fix HashSkipList and HashLinkedList SIGSEGV · 3c75cc15

由 Igor Canadi 提交于 3月 14, 2014

Summary:
Original Summary:
Yesterday, @ljin and I were debugging various db_stress issues. We suspected one of them happens when we concurrently call NewIterator without prefix_seek on HashSkipList. This test demonstrates it.

Update:
Arena is not thread-safe!! When creating a new full iterator, we *have* to create a new arena, otherwise we're doomed.

Test Plan: SIGSEGV and assertion-throwing test now works!

Reviewers: ljin, haobo, sdong

Reviewed By: sdong

CC: leveldb, ljin

Differential Revision: https://reviews.facebook.net/D16857

3c75cc15

13 3月, 2014 1 次提交

A heuristic way to check if a memtable is full · 11da8bc5

由 Kai Liu 提交于 3月 12, 2014

Summary:
This is is based on https://reviews.facebook.net/D15027. It's not finished but I would like to give a prototype to avoid arena over-allocation while making better use of the already allocated memory blocks.

Instead of check approximate memtable size, we will take a deeper look at the arena, which incorporate essential idea that @sdong suggests: flush when arena has allocated its last and the last is "almost full"

Test Plan: N/A

Reviewers: haobo, sdong

Reviewed By: sdong

CC: leveldb, sdong

Differential Revision: https://reviews.facebook.net/D15051

11da8bc5

12 3月, 2014 2 次提交

Fix data race against logging data structure because of LogBuffer · bd45633b

由 sdong 提交于 3月 11, 2014

Summary:
@igor pointed out that there is a potential data race because of the way we use the newly introduced LogBuffer. After "bg_compaction_scheduled_--" or "bg_flush_scheduled_--", they can both become 0. As soon as the lock is released after that, DBImpl's deconstructor can go ahead and deconstruct all the states inside DB, including the info_log object hold in a shared pointer of the options object it keeps. At that point it is not safe anymore to continue using the info logger to write the delayed logs.

With the patch, lock is released temporarily for log buffer to be flushed before "bg_compaction_scheduled_--" or "bg_flush_scheduled_--". In order to make sure we don't miss any pending flush or compaction, a new flag bg_schedule_needed_ is added, which is set to be true if there is a pending flush or compaction but not scheduled because of the max thread limit. If the flag is set to be true, the scheduling function will be called before compaction or flush thread finishes.

Thanks @igor for this finding!

Test Plan: make all check

Reviewers: haobo, igor

Reviewed By: haobo

CC: dhruba, ljin, yhchiang, igor, leveldb

Differential Revision: https://reviews.facebook.net/D16767

bd45633b

Env to add a function to allow users to query waiting queue length · 01dcef11

由 sdong 提交于 3月 10, 2014

Summary: Add a function to Env so that users can query the waiting queue length of each thread pool

Test Plan: add a test in env_test

Reviewers: haobo

Reviewed By: haobo

CC: dhruba, igor, yhchiang, ljin, nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D16755

01dcef11

11 3月, 2014 3 次提交

Consolidate SliceTransform object ownership · 8d007b4a

由 Lei Jin 提交于 3月 10, 2014

Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.

Test Plan: db_bench && make asan_check

Reviewers: haobo, igor, sdong

Reviewed By: igor

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D16587

8d007b4a

[RocksDB] LogBuffer Cleanup · 66da4679

由 Haobo Xu 提交于 3月 09, 2014

Summary: Moved LogBuffer class to an internal header. Removed some unneccesary indirection. Enabled log buffer for BackgroundCallFlush. Forced log buffer flush right after Unlock to improve time ordering of info log.

Test Plan: make check; db_bench compare LOG output

Reviewers: sdong

Reviewed By: sdong

CC: leveldb, igor

Differential Revision: https://reviews.facebook.net/D16707

66da4679

Add option verify_checksums_in_compaction · 04d2c26e

由 Igor Canadi 提交于 3月 10, 2014

Summary:
If verify_checksums_in_compaction is true, compaction will verify checksums. This is default.
If it's false, compaction doesn't verify checksums. This is useful for in-memory workloads.

Test Plan: corruption_test

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16695

04d2c26e

08 3月, 2014 2 次提交

use CAS when returning SuperVersion to ThreadLocal · e5fa4944

由 Lei Jin 提交于 3月 07, 2014

Summary:
Add a check at the end of GetImpl to release SuperVersion if it becomes
obsolete. Also do Scrape() inside InstallSuperVersion so it happens more
frequent.

Test Plan:
make all check
running asan_check now

Reviewers: igor, haobo, sdong, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16641

e5fa4944

Modify the compile error about ftruncate() · 056a0286

由 Yumikiyo Osanai 提交于 3月 08, 2014

Summary:
Change to store the return value from ftruncate().
The reason is that ftruncate() has "warn_unused_result" attribute in some environment.
Signed-off-by: NYumikiyo Osanai <yumios.art@gmail.com>

056a0286

07 3月, 2014 2 次提交

Fix Valgrind error introduced by D16515 · e1f52b6a

由 sdong 提交于 3月 06, 2014

Summary: valgrind reports issues. This patch seems to fix it.

Test Plan: run the tests that fails in valgrind

Reviewers: igor, haobo, kailiu

Reviewed By: kailiu

CC: dhruba, ljin, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D16653

e1f52b6a

Truncate unused space on PosixWritableFile::Close() · 26ac5603

由 Igor Canadi 提交于 3月 06, 2014

Summary:
Blocks allocated with fallocate will take extra space on disk even if they are unused and the file is close.

Now we remove the extra blocks at the end of the file by calling `ftruncate`.

Test Plan: added a test to env_test

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16647

26ac5603

06 3月, 2014 4 次提交

Make sure GetUniqueID releated tests run on "regular" storage · abeee9f2

由 Kai Liu 提交于 3月 05, 2014

Summary:
With the use of tmpfs or ramfs, unit tests related to GetUniqueID()
failed because of the failure from ioctl, which doesn't work with these
fancy file systems at all.

I fixed this issue and make sure all related tests run on the "regular"
storage (disk or flash).

Test Plan: TEST_TMPDIR=/dev/shm make check -j32

Reviewers: igor, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16593

abeee9f2

Buffer info logs when picking compactions and write them out after releasing the mutex · ecb1ffa2

由 sdong 提交于 3月 04, 2014

Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex.

Test Plan:
make all check
check the log lines while running some tests that trigger compactions.

Reviewers: haobo, igor, dhruba

Reviewed By: dhruba

CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D16515

ecb1ffa2

Allow user to specify log level for info_log · 4405f3a0

由 sdong 提交于 3月 04, 2014

Summary:
Currently, there is no easy way for user to change log level of info log. Add a parameter in options to specify that.
Also make the default level to INFO level. Removing the [INFO] tag if it is INFO level as I don't want to cause performance regression. (add [LOG] means another mem-copy and string formatting).

Test Plan:
make all check
manual check the levels work as expected.

Reviewers: dhruba, yhchiang

Reviewed By: yhchiang

CC: dhruba, igor, i.am.jin.lei, ljin, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D16563

4405f3a0

output perf_context in db_bench readrandom · 04298f8c

由 Lei Jin 提交于 3月 05, 2014

Summary:
Add helper function to print perf context data in db_bench if enabled.
I didn't find any code that actually exports perf context data. Not sure
if I missed anything

Test Plan: ran db_bench

Reviewers: haobo, sdong, igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16575

04298f8c

04 3月, 2014 1 次提交

Add a hash-index component for block · 906f3dca

由 kailiu 提交于 2月 18, 2014

Summary:
this is the key component extracted from diff: https://reviews.facebook.net/D14271
I separate it to a dedicated patch to make the review easier.

Test Plan: added a unit test and passed it.

Reviewers: haobo, sdong, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16245

906f3dca

01 3月, 2014 1 次提交

Make Log::Reader more robust · 58ca641d

由 Igor Canadi 提交于 2月 28, 2014

Summary:
This diff does two things:
(1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#
(2) Turn off mmap writes for all writes to log and manifest files

(2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing.

Test Plan:
Added unit tests from LevelDB
Actually recovered a "corrupted" MANIFEST file.

Reviewers: dhruba, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16119

58ca641d

28 2月, 2014 1 次提交
- K
  Fix some compilation bugs in different platforms · 6ba1084f
  由 Kai Liu 提交于 2月 27, 2014
```
Summary:

detect some problems when testing my 3rd party release tool.
```
  6ba1084f

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功