提交 · d29f1819238e9c4a90cee572b40fdef444d5366b · indiff7643 / Terarkdb

29 11月, 2012 1 次提交

由 Abhishek Kona 提交于 11月 28, 2012

Summary:
Scripted and removed all trailing spaces and converted all tabs to
spaces.

Also fixed other lint errors.
All lint errors from this point of time should be taken seriously.

Test Plan: make all check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7059

d29f1819

27 11月, 2012 1 次提交

Fix broken test; some ldb commands can run without a db_ · 6caf3b8e

由 Chip Turner 提交于 11月 21, 2012

Summary:
It would appear our unit tests make use of code from ldb_cmd,
and don't always require a valid database handle.  D6855 was not aware
db_ could sometimes be NULL for such commands, and so it broke
reduce_levels_test.

This moves the check elsewhere to (at least) fix the 'ldb dump' case of
segfaulting when it couldn't open a database.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6903

6caf3b8e

22 11月, 2012 2 次提交

Fix ldb segfault and use static libsnappy for all builds · 879e45eb

由 Chip Turner 提交于 11月 20, 2012

Summary:
Link statically against snappy, using the gvfs one for facebook
environments, and the bundled one otherwise.

In addition, fix a few minor segfaults in ldb when it couldn't open the
database, and update .gitignore to include a few other build artifacts.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6855

879e45eb

Support taking a configurable number of files from the same level to compact... · 7632fdb5

由 Dhruba Borthakur 提交于 11月 20, 2012

Support taking a configurable number of  files from the same level to compact in a single compaction run.

Summary:
The compaction process takes some files from LevelK and
merges it into LevelK+1. The number of files it picks from
LevelK was capped such a way that the total amount of
data picked does not exceed the maxfilesize of that level.
This essentially meant that only one file from LevelK
is picked for a single compaction.

For bulkloads, we would like to take many many file from
LevelK and compact them using a single compaction run.

This patch introduces a option called the 'source_compaction_factor'
(similar to expanded_compaction_factor). It is a multiplier
that is multiplied by the maxfilesize of that level to arrive
at the limit that is used to throttle the number of source
files from LevelK.  For bulk loads, set source_compaction_factor
to a very high number so that multiple files from the same
level are picked for compaction in a single compaction.

The default value of source_compaction_factor is 1, so that
we can keep backward compatibilty with existing compaction semantics.

Test Plan: make clean check

Reviewers: emayanke, sheki

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6867

7632fdb5

21 11月, 2012 1 次提交

Support to disable background compactions on a database. · fbb73a4a

由 Dhruba Borthakur 提交于 11月 20, 2012

Summary:
This option is needed for fast bulk uploads. The goal is to load
all the data into files in L0 without any interference from
background compactions.

Test Plan: make clean check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6849

fbb73a4a

20 11月, 2012 3 次提交

Fix LDB dumpwal to print the messages as in the file. · 661dc157

由 Abhishek Kona 提交于 11月 19, 2012

Summary:
StringStream.clear() does not clear the stream. It sets some flags.
Who knew? Fixing that is not printing the stuff again and again.

Test Plan: ran it on a local db

Reviewers: dhruba, emayanke

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6795

661dc157

LDB can read WAL. · 30742e16

由 Abhishek Kona 提交于 11月 12, 2012

Summary:
Add option to read WAL and print a summary for each record.
facebook task => #1885013

E.G. Output :
./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header
Sequence,Count,ByteSize
49981,1,100033
49981,1,100033
49982,1,100033
49981,1,100033
49982,1,100033
49983,1,100033
49981,1,100033
49982,1,100033
49983,1,100033
49984,1,100033
49981,1,100033
49982,1,100033

Test Plan:
Works run
./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: emayanke, leveldb, zshao

Differential Revision: https://reviews.facebook.net/D6675

30742e16

Fix LDB dumpwal to print the messages as in the file. · b648401a

由 Abhishek Kona 提交于 11月 19, 2012

Summary:
StringStream.clear() does not clear the stream. It sets some flags.
Who knew? Fixing that is not printing the stuff again and again.

Test Plan: ran it on a local db

Reviewers: dhruba, emayanke

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6795

b648401a

17 11月, 2012 1 次提交

LDB can read WAL. · f5cdf931

由 Abhishek Kona 提交于 11月 12, 2012

Summary:
Add option to read WAL and print a summary for each record.
facebook task => #1885013

E.G. Output :
./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header
Sequence,Count,ByteSize
49981,1,100033
49981,1,100033
49982,1,100033
49981,1,100033
49982,1,100033
49983,1,100033
49981,1,100033
49982,1,100033
49983,1,100033
49984,1,100033
49981,1,100033
49982,1,100033

Test Plan:
Works run
./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: emayanke, leveldb, zshao

Differential Revision: https://reviews.facebook.net/D6675

f5cdf931

14 11月, 2012 1 次提交

Improved CompactionFilter api: pass in a opaque argument to CompactionFilter invocation. · 5d16e503

由 Dhruba Borthakur 提交于 11月 13, 2012

Summary:
There are applications that operate on multiple leveldb instances.
These applications will like to pass in an opaque type for each
leveldb instance and this type should be passed back to the application
with every invocation of the CompactionFilter api.

Test Plan: Enehanced unit test for opaque parameter to CompactionFilter.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan, sheki, emayanke

Differential Revision: https://reviews.facebook.net/D6711

5d16e503

13 11月, 2012 1 次提交

Fix test failure of reduce_num_levels · c64796fd

由 heyongqiang 提交于 11月 11, 2012

Summary:
I changed the reduce_num_levels logic to avoid "compactRange()" call if the current number of levels in use (levels that contain files) is smaller than the new num of levels.
And that change breaks the assert in reduce_levels_test

Test Plan: run reduce_levels_test

Reviewers: dhruba, MarkCallaghan

Reviewed By: dhruba

CC: emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6651

c64796fd

11 11月, 2012 1 次提交

Compilation error while compiling with OPT=-g · 9c6c232e

由 Dhruba Borthakur 提交于 11月 11, 2012

Summary:
make clean check OPT=-g fails
leveldb::DBStatistics::getTickerCount(leveldb::Tickers)’:
./db/db_statistics.h:34: error: ‘MAX_NO_TICKERS’ was not declared in this scope
util/ldb_cmd.cc:255: warning: left shift count >= width of type

Test Plan:
make clean check OPT=-g

Reviewers:

CC:

Task ID: #

Blame Rev:

9c6c232e

10 11月, 2012 1 次提交

disable size compaction in ldb reduce_levels and added compression and file size parameter to it · 20d18a89

由 heyongqiang 提交于 11月 08, 2012

Summary:
disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction,

added --compression=none|snappy|zlib|bzip2 and --file_size= per-file size to ldb reduce_levels command

Test Plan: run ldb

Reviewers: dhruba, MarkCallaghan

Reviewed By: dhruba

CC: sheki, emayanke

Differential Revision: https://reviews.facebook.net/D6597

20d18a89

07 11月, 2012 1 次提交

Fix all warnings generated by -Wall option to the compiler. · aa42c668

由 Dhruba Borthakur 提交于 11月 06, 2012

Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.

Test Plan: make all check

Reviewers: heyongqiang, emayanke, sheki

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6525

aa42c668

06 11月, 2012 2 次提交

Ability to invoke application hook for every key during compaction. · 5273c814

由 Dhruba Borthakur 提交于 10月 29, 2012

Summary:
There are certain use-cases where the application intends to
delete older keys aftre they have expired a certian time period.
One option for those applications is to periodically scan the
entire database and delete appropriate keys.

A better way is to allow the application to hook into the
compaction process. This patch allows the application to set
a method callback for every key that is being compacted. If
this method returns true, then the key is not preserved in
the output of the compaction.

Test Plan:
This is mostly to preview the proposed new public api.
Since it is a public api, please do due diligence on reviewing it.

I will be writing test cases for this api in mynext version of
this patch.

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: heyongqiang

CC: sheki, adsharma

Differential Revision: https://reviews.facebook.net/D6285

5273c814

Add a tool to change number of levels · d55c2ba3

由 heyongqiang 提交于 10月 31, 2012

Summary: as subject.

Test Plan: manually test it, will add a testcase

Reviewers: dhruba, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6345

d55c2ba3

03 11月, 2012 1 次提交

Make compression options configurable. These include window-bits, level and... · 854c66b0

由 amayank 提交于 11月 01, 2012

Make compression options configurable. These include window-bits, level and strategy for ZlibCompression

Summary: Leveldb currently uses windowBits=-14 while using zlib compression.(It was earlier 15). This makes the setting configurable. Related changes here: https://reviews.facebook.net/D6105

Test Plan: make all check

Reviewers: dhruba, MarkCallaghan, sheki, heyongqiang

Differential Revision: https://reviews.facebook.net/D6393

854c66b0

02 11月, 2012 1 次提交

Add two more options: disable block cache and make table cache shard number configuable · 3096fa75

由 heyongqiang 提交于 10月 31, 2012

Summary:

as subject

Test Plan:

run db_bench and db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6111

3096fa75

30 10月, 2012 2 次提交

Allow having different compression algorithms on different levels. · 321dfdc3

由 Dhruba Borthakur 提交于 10月 27, 2012

Summary:
The leveldb API is enhanced to support different compression algorithms at
different levels.

This adds the option min_level_to_compress to db_bench that specifies
the minimum level for which compression should be done when
compression is enabled. This can be used to disable compression for levels
0 and 1 which are likely to suffer from stalls because of the CPU load
for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
gets frequent memtable flushes. Level 1 is special as it frequently
gets all:all file compactions between it and level 0. But all other levels
could be the same. For any level N where N > 1, the rate of sequential
IO for that level should be the same. The last level is the
exception because it might not be full and because files from it are
not read to compact with the next larger level.

The same amount of time will be spent doing compaction at any
level N excluding N=0, 1 or the last level. By this standard all
of those levels should use the same compression. The difference is that
the loss (using more disk space) from a faster compression algorithm
is less significant for N=2 than for N=3. So we might be willing to
trade disk space for faster write rates with no compression
for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
algorithm for the mid levels also allows us to reclaim some cpu
without trading off much loss in disk space overhead.

Also note that little is to be gained by compressing levels 0 and 1. For
a 4-level tree they account for 10% of the data. For a 5-level tree they
account for 1% of the data.

With compression enabled:
* memtable flush rate is ~18MB/second
* (L0,L1) compaction rate is ~30MB/second

With compression enabled but min_level_to_compress=2
* memtable flush rate is ~320MB/second
* (L0,L1) compaction rate is ~560MB/second

This practicaly takes the same code from https://reviews.facebook.net/D6225
but makes the leveldb api more general purpose with a few additional
lines of code.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6261

321dfdc3

Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0

由 Mark Callaghan 提交于 10月 26, 2012

Summary:
Adds a method that returns the score for the next level that most
needs compaction. That method is then used by db_bench to rate limit threads.
Threads are put to sleep at the end of each stats interval until the score
is less than the limit. The limit is set via the --rate_limit=$double option.
The specified value must be > 1.0. Also adds the option --stats_per_interval
to enable additional metrics reported every stats interval.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6243

70c42bf0

27 10月, 2012 2 次提交

K
Add the missing util/auto_split_logger.h · 8965c8d0
由 Kai Liu 提交于 10月 26, 2012
```
Summary:

Test Plan:

Reviewers:

CC:

Task ID: 1803577

Blame Rev:
```
8965c8d0

Enable LevelDb to create a new log file if current log file is too large. · d50f8eb6

由 Kai Liu 提交于 10月 26, 2012

Summary: Enable LevelDb to create a new log file if current log file is too large.

Test Plan:
Write a script and manually check the generated info LOG.

Task ID: 1803577

Blame Rev:

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: zshao

Differential Revision: https://reviews.facebook.net/D6003

d50f8eb6

20 10月, 2012 2 次提交

db_bench was not correctly initializing the value for delete_obsolete_files_period_micros option. · cf5adc80

由 Dhruba Borthakur 提交于 10月 19, 2012

Summary:
The parameter delete_obsolete_files_period_micros controls the
periodicity of deleting obsolete files. db_bench was reading in
this parameter intoa local variable called 'l' but was incorrectly
using another local variable called 'n' while setting it in the
db.options data structure.
This patch also logs the value of delete_obsolete_files_period_micros
in the LOG file at db startup time.

I am hoping that this will improve the overall write throughput drastically.

Test Plan: run db_bench

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6099

cf5adc80

This is the mega-patch multi-threaded compaction · 1ca05843

由 Dhruba Borthakur 提交于 10月 19, 2012

published in https://reviews.facebook.net/D5997.

Summary:
This patch allows compaction to occur in multiple background threads
concurrently.

If a manual compaction is issued, the system falls back to a
single-compaction-thread model. This is done to ensure correctess
and simplicity of code. When the manual compaction is finished,
the system resumes its concurrent-compaction mode automatically.

The updates to the manifest are done via group-commit approach.

Test Plan: run db_bench

1ca05843

17 10月, 2012 1 次提交

The deletion of obsolete files should not occur very frequently. · aa73538f

由 Dhruba Borthakur 提交于 10月 16, 2012

Summary:
The method DeleteObsolete files is a very costly methind, especially
when the number of files in a system is large. It makes a list of
all live-files and then scans the directory to compute the diff.
By default, this method is executed after every compaction run.

This patch makes it such that DeleteObsolete files is never
invoked twice within a configured period.

Test Plan: run all unit tests

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6045

aa73538f

04 10月, 2012 2 次提交

Implement RowLocks for assoc schema · f7975ac7

由 Dhruba Borthakur 提交于 10月 03, 2012

Summary:
Each assoc is identified by (id1, assocType). This is the rowkey.
Each row has a read/write rowlock. There is statically allocated array
of 2000 read/write locks. A rowkey is murmur-hashed to one of the
read/write locks.

assocPut and assocDelete acquires the rowlock in Write mode.
The key-updates are done within the rowlock with a atomic nosync
batch write to leveldb. Then the rowlock is released and
a write-with-sync is done to sync leveldb transaction log.

Test Plan: added unit test

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5859

f7975ac7

An configurable option to write data using write instead of mmap. · c1006d42

由 Dhruba Borthakur 提交于 10月 01, 2012

Summary:
We have seen that reading data via the pread call (instead of
mmap) is much faster on Linux 2.6.x kernels. This patch makes
an equivalent option to switch off mmaps for the write path
as well.

db_bench --mmap_write=0 will use write() instead of mmap() to
write data to a file.

This change is backward compatible, the default
option is to continue using mmap for writing to a file.

Test Plan: "make check all"

Differential Revision: https://reviews.facebook.net/D5781

c1006d42

02 10月, 2012 1 次提交

Implement ReadWrite locks for leveldb · a58d48de

由 Dhruba Borthakur 提交于 10月 01, 2012

Summary:
Implement ReadWrite locks for leveldb. These will be helpful
to implement a read-modify-write operation (e.g. atomic increments).

Test Plan: does not modify any existing code

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5787

a58d48de

30 9月, 2012 1 次提交

Print the block cache size in the LOG. · 72c45c66

由 Dhruba Borthakur 提交于 9月 29, 2012

Summary: Print the block cache size in the LOG.

Test Plan: run db_bench and look at LOG. This is helpful while I was debugging one use-case.

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5739

72c45c66

25 9月, 2012 1 次提交

The BackupAPI should also list the length of the manifest file. · ae36e509

由 Dhruba Borthakur 提交于 9月 24, 2012

Summary:
The GetLiveFiles() api lists the set of sst files and the current
MANIFEST file. But the database continues to append new data to the
MANIFEST file even when the application is backing it up to the
backup location. This means that the database-version that is
stored in the MANIFEST FILE in the backup location
does not correspond to the sst files returned by GetLiveFiles.

This API adds a new parameter to GetLiveFiles. This new parmeter
returns the current size of the MANIFEST file.

Test Plan: Unit test attached.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5631

ae36e509

20 9月, 2012 1 次提交

Allow a configurable number of background threads. · 9e84834e

由 Dhruba Borthakur 提交于 9月 19, 2012

Summary:
The background threads are necessary for compaction.
For slower storage, it might be necessary to have more than
one compaction thread per DB. This patch allows creating
a configurable number of worker threads.
The default reamins at 1 (to maintain backward compatibility).

Test Plan:
run all unit tests. changes to db-bench coming in
a separate patch.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5559

9e84834e

18 9月, 2012 1 次提交

add an option to disable seek compaction · a8464ed8

由 heyongqiang 提交于 9月 17, 2012

Summary:
as subject. This diff should be good for benchmarking.

will send another diff to make it better in the case the seek compaction is enable.
In that coming diff, will not count a seek if the bloomfilter filters.

Test Plan: build

Reviewers: dhruba, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5481

a8464ed8

17 9月, 2012 1 次提交

add a global var leveldb::useMmapRead to enable mmap Summary: · b85cdca6

由 heyongqiang 提交于 9月 14, 2012

Summary:
as subject. this can be used for benchmarking.
If we want it for some cases, we can do more changes to make this part of the option.

Test Plan: db_test

Reviewers: dhruba

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5451

b85cdca6

15 9月, 2012 1 次提交

Remove use of mmap for random reads · 33323f21

由 Mark Callaghan 提交于 9月 14, 2012

Summary:
Reads via mmap on concurrent workloads are much slower than pread.
For example on a 24-core server with storage that can do 100k IOPS or more
I can get no more than 10k IOPS with mmap reads and 32+ threads.

Test Plan: db_bench benchmarks

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5433

33323f21

14 9月, 2012 2 次提交

Ability to switch off filesystem read-aheads · 93f49520

由 Dhruba Borthakur 提交于 9月 13, 2012

Summary:
Ability to switch off filesystem read-aheads. This change is
backward-compatible: the default setting is to allow file
system read-aheads.

Test Plan: run benchmarks

Reviewers: heyongqiang, adsharma

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5391

93f49520

Do not cache readahead-pages in the OS cache. · 4028ae7d

由 Dhruba Borthakur 提交于 9月 13, 2012

Summary:
When posix_fadvise(offset, offset) is usedm it frees up only those
pages in that specified range. But the filesystem could have done some
read-aheads and those get cached in the OS cache.

Do not cache readahead-pages in the OS cache.

Test Plan: run db_bench benchmark.

Reviewers: vamsi, heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5379

4028ae7d

13 9月, 2012 1 次提交

Fix compiler warnings. Use uint64_t instead of uint. · 407727b7

由 Dhruba Borthakur 提交于 9月 12, 2012

Summary: Fix compiler warnings. Use uint64_t instead of uint.

Test Plan: build using -Wall

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5355

407727b7

07 9月, 2012 1 次提交

put log in a seperate dir · 0f43aa47

由 heyongqiang 提交于 9月 05, 2012

Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5205

0f43aa47

30 8月, 2012 2 次提交

Clean up compiler warnings generated by -Wall option. · fe936316

由 Dhruba Borthakur 提交于 8月 29, 2012

Summary:
Clean up compiler warnings generated by -Wall option.
make clean all OPT=-Wall

This is a pre-requisite before making a new release.

Test Plan: compile and run unit tests

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5019

fe936316

The sharding of the block cache is limited to 2*20 pieces. · e5fe80e4

由 Dhruba Borthakur 提交于 8月 29, 2012

Summary:
The numbers of shards that the block cache is divided into is
configurable. However, if the user specifies that he/she wants
the block cache to be divided into more than 2**20 pieces, then
the system will rey to allocate a huge array of that size) that
could fail.

It is better to limit the sharding of the block cache to an
upper bound. The default sharding is 16 shards (i.e. 2**4)
and the maximum is now 2 million shards (i.e. 2**20).

Also, fixed a bug with the LRUCache where the numShardBits
should be a private member of the LRUCache object rather than
a static variable.

Test Plan:
run db_bench with --cache_numshardbits=64.

Task ID: #

Blame Rev:

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5013

e5fe80e4