提交 · f1a7c735b5c7c0add9a86fd214c71c66732de533 · kvdb / rocksdb

02 11月, 2012 1 次提交

Add two more options: disable block cache and make table cache shard number configuable · 3096fa75

由 heyongqiang 提交于 10月 31, 2012

Summary:

as subject

Test Plan:

run db_bench and db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6111

3096fa75

30 10月, 2012 4 次提交

Use timer to measure sleep rather than assume it is 1000 usecs · 3e7e2692

由 Mark Callaghan 提交于 10月 29, 2012

Summary:
This makes the stall timers in MakeRoomForWrite more accurate by timing
the sleeps. From looking at the logs the real sleep times are usually
about 2000 usecs each when SleepForMicros(1000) is called. The modified LOG messages are:
2012/10/29-12:06:33.271984 2b3cc872f700 delaying write 13 usecs for level0_slowdown_writes_trigger
2012/10/29-12:06:34.688939 2b3cc872f700 delaying write 1728 usecs for rate limits with max score 3.83

Task ID: #

Blame Rev:

Test Plan:
run db_bench, look at DB/LOG

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6297

3e7e2692

Allow having different compression algorithms on different levels. · 321dfdc3

由 Dhruba Borthakur 提交于 10月 27, 2012

Summary:
The leveldb API is enhanced to support different compression algorithms at
different levels.

This adds the option min_level_to_compress to db_bench that specifies
the minimum level for which compression should be done when
compression is enabled. This can be used to disable compression for levels
0 and 1 which are likely to suffer from stalls because of the CPU load
for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
gets frequent memtable flushes. Level 1 is special as it frequently
gets all:all file compactions between it and level 0. But all other levels
could be the same. For any level N where N > 1, the rate of sequential
IO for that level should be the same. The last level is the
exception because it might not be full and because files from it are
not read to compact with the next larger level.

The same amount of time will be spent doing compaction at any
level N excluding N=0, 1 or the last level. By this standard all
of those levels should use the same compression. The difference is that
the loss (using more disk space) from a faster compression algorithm
is less significant for N=2 than for N=3. So we might be willing to
trade disk space for faster write rates with no compression
for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
algorithm for the mid levels also allows us to reclaim some cpu
without trading off much loss in disk space overhead.

Also note that little is to be gained by compressing levels 0 and 1. For
a 4-level tree they account for 10% of the data. For a 5-level tree they
account for 1% of the data.

With compression enabled:
* memtable flush rate is ~18MB/second
* (L0,L1) compaction rate is ~30MB/second

With compression enabled but min_level_to_compress=2
* memtable flush rate is ~320MB/second
* (L0,L1) compaction rate is ~560MB/second

This practicaly takes the same code from https://reviews.facebook.net/D6225
but makes the leveldb api more general purpose with a few additional
lines of code.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6261

321dfdc3

Add more rates to db_bench output · acc8567b

由 Mark Callaghan 提交于 10月 29, 2012

Summary:
Adds the "MB/sec in" and "MB/sec out" to this line:
Amplification: 1.7 rate, 0.01 GB in, 0.02 GB out, 8.24 MB/sec in, 13.75 MB/sec out

Changes all values to be reported per interval and since test start for this line:
... thread 0: (10000,60000) ops and (19155.6,27307.5) ops/second in (0.522041,2.197198) seconds

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6291

acc8567b

Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0

由 Mark Callaghan 提交于 10月 26, 2012

Summary:
Adds a method that returns the score for the next level that most
needs compaction. That method is then used by db_bench to rate limit threads.
Threads are put to sleep at the end of each stats interval until the score
is less than the limit. The limit is set via the --rate_limit=$double option.
The specified value must be > 1.0. Also adds the option --stats_per_interval
to enable additional metrics reported every stats interval.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6243

70c42bf0

27 10月, 2012 2 次提交

Enable LevelDb to create a new log file if current log file is too large. · d50f8eb6

由 Kai Liu 提交于 10月 26, 2012

Summary: Enable LevelDb to create a new log file if current log file is too large.

Test Plan:
Write a script and manually check the generated info LOG.

Task ID: 1803577

Blame Rev:

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: zshao

Differential Revision: https://reviews.facebook.net/D6003

d50f8eb6

Normalize compaction stats by time in compaction · 65855dd8

由 Mark Callaghan 提交于 10月 26, 2012

Summary:
I used server uptime to compute per-level IO throughput rates. I
intended to use time spent doing compaction at that level. This fixes that.

Task ID: #

Blame Rev:

Test Plan:
run db_bench, look at results

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6237

65855dd8

25 10月, 2012 1 次提交

Improve statistics · e7206f43

由 Mark Callaghan 提交于 10月 23, 2012

Summary:
This adds more statistics to be reported by GetProperty("leveldb.stats").
The new stats include time spent waiting on stalls in MakeRoomForWrite.
This also includes the total amplification rate where that is:
    (#bytes of sequential IO during compaction) / (#bytes from Put)
This also includes a lot more data for the per-level compaction report.
* Rn(MB) - MB read from level N during compaction between levels N and N+1
* Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
* Wnew(MB) - new data written to the level during compaction
* Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
* Rn - files read from level N during compaction between levels N and N+1
* Rnp1 - files read from level N+1 during compaction between levels N and N+1
* Wnp1 - files written to level N+1 during compaction between levels N and N+1
* NewW - new files written to level N+1 during compaction
* Count - number of compactions done for this level

This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)

                               Compactions
Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
-------------------------------------------------------------------------------------------------------------------------------------
  0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
  1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
  2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
Uptime(secs): 439.8
Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
(cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D6153

e7206f43

23 10月, 2012 1 次提交

Delete files outside the mutex. · 4c107587

由 Dhruba Borthakur 提交于 10月 21, 2012

Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6123

4c107587

17 10月, 2012 1 次提交

The deletion of obsolete files should not occur very frequently. · aa73538f

由 Dhruba Borthakur 提交于 10月 16, 2012

Summary:
The method DeleteObsolete files is a very costly methind, especially
when the number of files in a system is large. It makes a list of
all live-files and then scans the directory to compute the diff.
By default, this method is executed after every compaction run.

This patch makes it such that DeleteObsolete files is never
invoked twice within a configured period.

Test Plan: run all unit tests

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6045

aa73538f

26 9月, 2012 1 次提交

If ReadCompaction is switched off, then it is better to not even submit background compaction jobs. · 24eea931

由 Dhruba Borthakur 提交于 9月 25, 2012

Summary:
If ReadCompaction is switched off, then it is better to not even
submit background compaction jobs. I see about 3% increase in
read-throughput on a pure memory database.

Test Plan: run db_bench

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5673

24eea931

22 9月, 2012 1 次提交

Segfault in DoCompactionWork caused by buffer overflow · bb2dcd24

由 Dhruba Borthakur 提交于 9月 21, 2012

Summary:
The code was allocating 200 bytes on the stack but it
writes 256 bytes into the array.

x8a8ea5 std::_Rb_tree<>::erase()
    @     0x7f134bee7eb0 (unknown)
    @           0x8a8ea5 std::_Rb_tree<>::erase()
    @           0x8a35d6 leveldb::DBImpl::CleanupCompaction()
    @           0x8a7810 leveldb::DBImpl::BackgroundCompaction()
    @           0x8a804d leveldb::DBImpl::BackgroundCall()
    @           0x8c4eff leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper()
    @     0x7f134b3c010d start_thread
    @     0x7f134bf9f10d clone

Test Plan: run db_bench with overwrite option

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5595

bb2dcd24

19 9月, 2012 1 次提交

Print out the compile version in the LOG. · fb4b381a

由 Dhruba Borthakur 提交于 9月 18, 2012

Summary: Print out the compile version in the LOG.

Test Plan: run dbbench and verify LOG

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5529

fb4b381a

18 9月, 2012 1 次提交

Ability to take a file-lvel snapshot from leveldb. · ba55d77b

由 Dhruba Borthakur 提交于 9月 14, 2012

Summary:
A set of apis that allows an application to backup data from the
leveldb database based on a set of files.

Test Plan: unint test attached. more coming soon.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5439

ba55d77b

13 9月, 2012 1 次提交

Fix compiler warnings. Use uint64_t instead of uint. · 407727b7

由 Dhruba Borthakur 提交于 9月 12, 2012

Summary: Fix compiler warnings. Use uint64_t instead of uint.

Test Plan: build using -Wall

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5355

407727b7

07 9月, 2012 1 次提交

put log in a seperate dir · 0f43aa47

由 heyongqiang 提交于 9月 05, 2012

Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5205

0f43aa47

30 8月, 2012 1 次提交

Clean up compiler warnings generated by -Wall option. · fe936316

由 Dhruba Borthakur 提交于 8月 29, 2012

Summary:
Clean up compiler warnings generated by -Wall option.
make clean all OPT=-Wall

This is a pre-requisite before making a new release.

Test Plan: compile and run unit tests

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5019

fe936316

29 8月, 2012 3 次提交

merge 1.5 · a4f9b8b4

由 heyongqiang 提交于 8月 26, 2012

Summary:

as subject

Test Plan:

db_test table_test

Reviewers: dhruba

a4f9b8b4

Do not spin in a tight loop attempting compactions if there is a compaction error · 6fee5a74

由 heyongqiang 提交于 8月 22, 2012

Summary: as subject. ported the change from google code leveldb 1.5

Test Plan: run db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4839

6fee5a74

fix db_test error with scribe logger turned on · d3759ca1

由 heyongqiang 提交于 8月 27, 2012

Summary: as subject

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4929

d3759ca1

28 8月, 2012 1 次提交

Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync). · fc20273e

由 Dhruba Borthakur 提交于 8月 27, 2012

Summary:
Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync).
This is needed for data durability when running on ext3 filesystems.
Added options to the benchmark db_bench to generate performance numbers
with either fsync or fdatasync enabled.

Cleaned up Makefile to build leveldb_shell only when building the thrift
leveldb server.

Test Plan: build and run benchmark

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4911

fc20273e

25 8月, 2012 1 次提交
- H
  add more logs · 1c99b0a6
  由 heyongqiang 提交于 8月 22, 2012
```
Summary:

as subject

Test Plan:db_test

Reviewers: dhruba
```
  1c99b0a6
23 8月, 2012 1 次提交

Log the open-options to the LOG. · e5a7c8e5

由 Dhruba Borthakur 提交于 8月 22, 2012

Summary: Log the open-options to the LOG. Use options_ instead of options because SanitizeOptions could modify the max_file_open limit.

Test Plan: num db_bench

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4833

e5a7c8e5

22 8月, 2012 3 次提交

Fixed unit test c_test by initializing logger=NULL. · a098207c

由 Dhruba Borthakur 提交于 8月 21, 2012

Summary:
Fixed unit test c_test by initializing logger=NULL.

Removed "atomic" from last_log_ts so that unit tests do not require C11 compiler.
Anyway, last_log_ts is mostly used for logging, so it is ok if it is loosely
accurate.

Test Plan: run c_test

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4803

a098207c

Record the version of the source repository that was used to build the leveldb library. · f4e7febf

由 Dhruba Borthakur 提交于 8月 21, 2012

Summary: Record the version of the source that we are compiling. We keep a record of the git revision in util/version.cc. This source file is then built as a regular source file as part of the compilation process. One can run "strings executable_filename | grep _build_" to find the version of the source that we used to build the executable file.

Test Plan: none

Differential Revision: https://reviews.facebook.net/D4785

f4e7febf

adding a scribe logger in leveldb to log leveldb deploy stats · 6ba1f177

由 heyongqiang 提交于 8月 14, 2012

Summary:
as subject.

A new log is written to scribe via thrift client when a new db is opened and when there is
a compaction.

a new option var scribe_log_db_stats is added.

Test Plan: manually checked using command "ptail -time 0 leveldb_deploy_stats"

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4659

6ba1f177

18 8月, 2012 2 次提交

add compaction log Summary: · 680e571c

由 heyongqiang 提交于 8月 17, 2012

Summary:
add compaction summary to log

log looks like:

2012/08/17-18:18:32.557334 7fdcaa2bb700 Compaction summary: Base level 0, input file:[11 9 7 ],[]

Test Plan: tested via db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4749

680e571c

use ts as suffix for LOG.old files · 20ee76bd

由 heyongqiang 提交于 8月 17, 2012

Summary: as subject and only maintain 10 log files.

Test Plan: new test in db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4731

20ee76bd

04 8月, 2012 1 次提交

Introduce a new option disableDataSync for opening the database. If this is... · c3096afd

由 Dhruba Borthakur 提交于 8月 03, 2012

Introduce a new option disableDataSync for opening the database. If this is set to true, then the data written to newly created data files are not sycned to disk, instead depend on the OS to flush dirty data to stable storage. This option is good for bulk

Test Plan:
manual tests

Task ID: #

Blame Rev:

Differential Revision: https://reviews.facebook.net/D4515

c3096afd

07 7月, 2012 1 次提交

add flush interface to DB · 22ee777f

由 heyongqiang 提交于 7月 06, 2012

Summary: as subject. The flush will flush everything in the db.

Test Plan: new test in db_test.cc

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4029

22ee777f

06 7月, 2012 1 次提交

add disable WAL option · a347d4ac

由 heyongqiang 提交于 7月 05, 2012

Summary: add disable WAL option

Test Plan: new testcase in db_test.cc

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4011

a347d4ac

28 6月, 2012 1 次提交

Make some variables configurable for each db instance · 4e4b6812

由 heyongqiang 提交于 6月 22, 2012

Summary:
Make configurable 'targetFileSize', 'targetFileSizeMultiplier',
'maxBytesForLevelBase', 'maxBytesForLevelMultiplier',
'expandedCompactionFactor', 'maxGrandParentOverlapFactor'

Test Plan: N/A

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3801

4e4b6812

02 6月, 2012 1 次提交

Print log message when we are throttling writes. · 338939e5

由 Dhruba Borthakur 提交于 5月 30, 2012

Summary:
Added option --writes=xxx to specify the number of keys that we want to overwrite in the benchmark.

Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Reviewers: adsharma

CC: sc

Differential Revision: https://reviews.facebook.net/D3465

338939e5

17 4月, 2012 1 次提交

Added bloom filter support. · 85584d49

由 Sanjay Ghemawat 提交于 4月 17, 2012

In particular, we add a new FilterPolicy class.  An instance
of this class can be supplied in Options when opening a
database.  If supplied, the instance is used to generate
summaries of keys (e.g., a bloom filter) which are placed in
sstables.  These summaries are consulted by DB::Get() so we
can avoid reading sstable blocks that are guaranteed to not
contain the key we are looking for.

This change provides one implementation of FilterPolicy
based on bloom filters.

Other changes:
- Updated version number to 1.4.
- Some build tweaks.
- C binding for CompactRange.
- A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom.
- Minor .gitignore update.

85584d49

09 3月, 2012 2 次提交
- S
  
  fix LOCK file deletion to prevent crash on windows · 583f1499
  由 Sanjay Ghemawat 提交于 3月 09, 2012
  
  583f1499
- S
  
  added group commit; drastically speeds up mult-threaded synchronous write workloads · d79762e2
  由 Sanjay Ghemawat 提交于 3月 08, 2012
  
  d79762e2
26 1月, 2012 1 次提交
- S
  
  fixed issues 66 (leaking files on disk error) and 68 (no sync of CURRENT file) · 3c8be108
  由 Sanjay Ghemawat 提交于 1月 25, 2012
  
  3c8be108
01 11月, 2011 1 次提交

A number of fixes: · 36a5f8ed

由 Hans Wennborg 提交于 10月 31, 2011

- Replace raw slice comparison with a call to user comparator.
  Added test for custom comparators.

- Fix end of namespace comments.

- Fixed bug in picking inputs for a level-0 compaction.

  When finding overlapping files, the covered range may expand
  as files are added to the input set.  We now correctly expand
  the range when this happens instead of continuing to use the
  old range.  For example, suppose L0 contains files with the
  following ranges:

      F1: a .. d
      F2:    c .. g
      F3:       f .. j

  and the initial compaction target is F3.  We used to search
  for range f..j which yielded {F2,F3}.  However we now expand
  the range as soon as another file is added.  In this case,
  when F2 is added, we expand the range to c..j and restart the
  search.  That picks up file F1 as well.

  This change fixes a bug related to deleted keys showing up
  incorrectly after a compaction as described in Issue 44.

(Sync with upstream @25072954)

36a5f8ed

06 10月, 2011 1 次提交

A number of bugfixes: · 299ccedf

由 Gabor Cselle 提交于 10月 05, 2011

- Added DB::CompactRange() method.

  Changed manual compaction code so it breaks up compactions of
  big ranges into smaller compactions.

  Changed the code that pushes the output of memtable compactions
  to higher levels to obey the grandparent constraint: i.e., we
  must never have a single file in level L that overlaps too
  much data in level L+1 (to avoid very expensive L-1 compactions).

  Added code to pretty-print internal keys.

- Fixed bug where we would not detect overlap with files in
  level-0 because we were incorrectly using binary search
  on an array of files with overlapping ranges.

  Added "leveldb.sstables" property that can be used to dump
  all of the sstables and ranges that make up the db state.

- Removing post_write_snapshot support.  Email to leveldb mailing
  list brought up no users, just confusion from one person about
  what it meant.

- Fixing static_cast char to unsigned on BIG_ENDIAN platforms.

  Fixes	Issue 35 and Issue 36.

- Comment clarification to address leveldb Issue 37.

- Change license in posix_logger.h to match other files.

- A build problem where uint32 was used instead of uint32_t.

Sync with upstream @24408625

299ccedf

02 9月, 2011 1 次提交

Bugfixes: for Get(), don't hold mutex while writing log. · 72630236

由 gabor@google.com 提交于 9月 01, 2011

- Fix bug in Get: when it triggers a compaction, it could sometimes
  mark the compaction with the wrong level (if there was a gap
  in the set of levels examined for the Get).

- Do not hold mutex while writing to the log file or to the
  MANIFEST file.

  Added a new benchmark that runs a writer thread concurrently with
  reader threads.

  Percentiles
  ------------------------------
  micros/op: avg  median 99   99.9  99.99  99.999 max
  ------------------------------------------------------
  before:    42   38     110  225   32000  42000  48000
  after:     24   20     55   65    130    1100   7000

- Fixed race in optimized Get.  It should have been using the
  pinned memtables, not the current memtables.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@50 62dab493-f737-651d-591e-8d6aee1b9529

72630236

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功