提交 · d68880a1b99b92823b80e9bf922242088b6453ba · kvdb / rocksdb

07 3月, 2013 1 次提交

Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file · d68880a1

由 Abhishek Kona 提交于 3月 04, 2013

Summary:
Store the last flushed, seq no. in db_impl. Check against it in
transaction Log iterator. Do not attempt to read ahead if we do not know
if the data is flushed completely.
Does not work if flush is disabled. Any ideas on fixing that?
* Minor change, iter->Next is called the first time automatically for
* the first time.

Test Plan:
existing test pass.
More ideas on testing this?
Planning to run some stress test.

Reviewers: dhruba, heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9087

d68880a1

04 3月, 2013 1 次提交

Add rate_delay_limit_milliseconds · 993543d1

由 Mark Callaghan 提交于 3月 02, 2013

Summary:
This adds the rate_delay_limit_milliseconds option to make the delay
configurable in MakeRoomForWrite when the max compaction score is too high.
This delay is called the Ln slowdown. This change also counts the Ln slowdown
per level to make it possible to see where the stalls occur.

From IO-bound performance testing, the Level N stalls occur:
* with compression -> at the largest uncompressed level. This makes sense
                      because compaction for compressed levels is much
                      slower. When Lx is uncompressed and Lx+1 is compressed
                      then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                      compaction process is the first to be slowed by
                      compression.
* without compression -> at level 1

Task ID: #1832108

Blame Rev:

Test Plan:
run with real data, added test

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9045

993543d1

23 2月, 2013 1 次提交

Measure compaction time. · 959337ed

由 Abhishek Kona 提交于 2月 21, 2013

Summary: just record time consumed in compaction

Test Plan: compile

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8781

959337ed

21 2月, 2013 1 次提交

Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value · b2c50f1c

由 amayank 提交于 2月 15, 2013

Summary:
Changed the Get and Scan options with openForReadOnly mode to have access to the memtable.
Changed the visibility of NewInternalIterator in db_impl from private to protected so that
the derived class db_impl_read_only can call that in its NewIterator function for the
scan case. The previous approach which changed the default for flush_on_destroy_ from false to true
caused many problems in the unit tests due to empty sst files that it created. All
unit tests pass now.

Test Plan: make clean; make all check; ldb put and get and scans

Reviewers: dhruba, heyongqiang, sheki

Reviewed By: dhruba

CC: kosievdmerwe, zshao, dilipj, kailiu

Differential Revision: https://reviews.facebook.net/D8697

b2c50f1c

26 1月, 2013 1 次提交

Fix poor error on num_levels mismatch and few other minor improvements · 0b83a831

由 Chip Turner 提交于 1月 24, 2013

Summary:
Previously, if you opened a db with num_levels set lower than
the database, you received the unhelpful message "Corruption:
VersionEdit: new-file entry."  Now you get a more verbose message
describing the issue.

Also, fix handling of compression_levels (both the run-over-the-end
issue and the memory management of it).

Lastly, unique_ptr'ify a couple of minor calls.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8151

0b83a831

24 1月, 2013 1 次提交

Fix a number of object lifetime/ownership issues · 2fdf91a4

由 Chip Turner 提交于 1月 20, 2013

Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.

Test Plan: db_stress, make check

Reviewers: dhruba

Reviewed By: dhruba

CC: zshao, leveldb, heyongqiang

Differential Revision: https://reviews.facebook.net/D8043

2fdf91a4

17 1月, 2013 1 次提交

rollover manifest file. · 7d5a4383

由 Abhishek Kona 提交于 1月 10, 2013

Summary:
Check in LogAndApply if the file size is more than the limit set in
Options.
Things to consider : will this be expensive?

Test Plan: make all check. Inputs on a new unit test?

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7701

7d5a4383

20 12月, 2012 1 次提交

Enhance ReadOnly mode to process the all committed transactions. · f4c2b7cf

由 Dhruba Borthakur 提交于 12月 18, 2012

Summary:
Leveldb has an api OpenForReadOnly() that opens the database
in readonly mode. This call had an option to not process the
transaction log.  This patch removes this option and always
processes all transactions that had been committed. It has
been done in such a way that it does not create/write to
any new files in the process. The invariant of "no-writes"
to the leveldb data directory is still true.

This enhancement allows multiple threads to open the same database
in readonly mode and access all trancations that were committed right
upto the OpenForReadOnly call.

I changed the public API to match the new semantics because
there are no users who are currently using this api.

Test Plan: make clean check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7479

f4c2b7cf

11 12月, 2012 2 次提交

An public api to fetch the latest transaction id. · 24fc3792

由 Dhruba Borthakur 提交于 12月 10, 2012

Summary:
Implement a interface to retrieve the most current transaction
id from the database.

Test Plan: Added unit test.

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7269

24fc3792

Refactor GetArchivalDirectoryName to filename.h · 1c6742e3

由 Abhishek Kona 提交于 12月 07, 2012

Summary:
filename.h has functions to do similar things.
Moving code away from db_impl.cc

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7251

1c6742e3

08 12月, 2012 1 次提交

GetUpdatesSince API to enable replication. · 80550089

由 Abhishek Kona 提交于 11月 29, 2012

Summary:
How it works:
* GetUpdatesSince takes a SequenceNumber.
* A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found.
* Seek in the logFile till the requested SeqNumber is found.
* Return an iterator which contains logic to return record's one by one.

Test Plan:
* Test case included to check the good code path.
* Will update with more test-cases.
* Feedback required on test-cases.

Reviewers: dhruba, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7119

80550089

29 11月, 2012 3 次提交

Move WAL files to archive directory, instead of deleting. · d4627e6d

由 sheki 提交于 11月 26, 2012

Summary:
Create a directory "archive" in the DB directory.
During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory,
instead of deleting.

Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move.

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6975

d4627e6d

Fix all the lint errors. · d29f1819

由 Abhishek Kona 提交于 11月 28, 2012

Summary:
Scripted and removed all trailing spaces and converted all tabs to
spaces.

Also fixed other lint errors.
All lint errors from this point of time should be taken seriously.

Test Plan: make all check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7059

d29f1819

Delete non-visible keys during a compaction even in the presense of snapshots. · 9a357847

由 Dhruba Borthakur 提交于 11月 26, 2012

Summary:
 LevelDB should delete almost-new keys when a long-open snapshot exists.
The previous behavior is to keep all versions that were created after the
oldest open snapshot. This can lead to database size bloat for
high-update workloads when there are long-open snapshots and long-open
snapshot will be used for logical backup. By "almost new" I mean that the
key was updated more than once after the oldest snapshot.

If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if
we find two instances of the same key k1 that lie entirely within s1 and
s2 (i.e. s1 < k1 < s2), then the earlier version
of k1 can be safely deleted because that version is not visible in any snapshot.

Test Plan:
unit test attached
make clean check

Differential Revision: https://reviews.facebook.net/D6999

9a357847

19 11月, 2012 1 次提交

enhance dbstress to simulate hard crash · 62e7583f

由 Dhruba Borthakur 提交于 11月 16, 2012

Summary:
dbstress has an option to reopen the database. Make it such that the
previous handle is not closed before we reopen, this simulates a
situation similar to a process crash.

Added new api to DMImpl to remove the lock file.

Test Plan: run db_stress

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6777

62e7583f

08 11月, 2012 1 次提交

Add a readonly db · 3fcf533e

由 heyongqiang 提交于 11月 05, 2012

Summary: as subject

Test Plan: run db_bench readrandom

Reviewers: dhruba

Reviewed By: dhruba

CC: MarkCallaghan, emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6495

3fcf533e

07 11月, 2012 1 次提交

Flush Data at object destruction if disableWal is used. · 4e413df3

由 Abhishek Kona 提交于 11月 06, 2012

Summary:
Added a conditional flush in ~DBImpl to flush.
There is still a chance of writes not being persisted if there is a
crash (not a clean shutdown) before the DBImpl instance is destroyed.

Test Plan: modified db_test to meet the new expectations.

Reviewers: dhruba, heyongqiang

Differential Revision: https://reviews.facebook.net/D6519

4e413df3

30 10月, 2012 1 次提交

Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0

由 Mark Callaghan 提交于 10月 26, 2012

Summary:
Adds a method that returns the score for the next level that most
needs compaction. That method is then used by db_bench to rate limit threads.
Threads are put to sleep at the end of each stats interval until the score
is less than the limit. The limit is set via the --rate_limit=$double option.
The specified value must be > 1.0. Also adds the option --stats_per_interval
to enable additional metrics reported every stats interval.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6243

70c42bf0

25 10月, 2012 1 次提交

Improve statistics · e7206f43

由 Mark Callaghan 提交于 10月 23, 2012

Summary:
This adds more statistics to be reported by GetProperty("leveldb.stats").
The new stats include time spent waiting on stalls in MakeRoomForWrite.
This also includes the total amplification rate where that is:
    (#bytes of sequential IO during compaction) / (#bytes from Put)
This also includes a lot more data for the per-level compaction report.
* Rn(MB) - MB read from level N during compaction between levels N and N+1
* Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
* Wnew(MB) - new data written to the level during compaction
* Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
* Rn - files read from level N during compaction between levels N and N+1
* Rnp1 - files read from level N+1 during compaction between levels N and N+1
* Wnp1 - files written to level N+1 during compaction between levels N and N+1
* NewW - new files written to level N+1 during compaction
* Count - number of compactions done for this level

This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)

                               Compactions
Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
-------------------------------------------------------------------------------------------------------------------------------------
  0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
  1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
  2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
Uptime(secs): 439.8
Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
(cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D6153

e7206f43

23 10月, 2012 1 次提交

Delete files outside the mutex. · 4c107587

由 Dhruba Borthakur 提交于 10月 21, 2012

Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6123

4c107587

21 10月, 2012 2 次提交

Delete files outside the mutex. · f95219fb

由 Dhruba Borthakur 提交于 10月 21, 2012

Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6123

f95219fb

Delete files outside the mutex. · 64c4b9f0

由 Dhruba Borthakur 提交于 10月 21, 2012

Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:

64c4b9f0

20 10月, 2012 1 次提交

This is the mega-patch multi-threaded compaction · 1ca05843

由 Dhruba Borthakur 提交于 10月 19, 2012

published in https://reviews.facebook.net/D5997.

Summary:
This patch allows compaction to occur in multiple background threads
concurrently.

If a manual compaction is issued, the system falls back to a
single-compaction-thread model. This is done to ensure correctess
and simplicity of code. When the manual compaction is finished,
the system resumes its concurrent-compaction mode automatically.

The updates to the manifest are done via group-commit approach.

Test Plan: run db_bench

1ca05843

17 10月, 2012 1 次提交

The deletion of obsolete files should not occur very frequently. · aa73538f

由 Dhruba Borthakur 提交于 10月 16, 2012

Summary:
The method DeleteObsolete files is a very costly methind, especially
when the number of files in a system is large. It makes a list of
all live-files and then scans the directory to compute the diff.
By default, this method is executed after every compaction run.

This patch makes it such that DeleteObsolete files is never
invoked twice within a configured period.

Test Plan: run all unit tests

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6045

aa73538f

25 9月, 2012 1 次提交

The BackupAPI should also list the length of the manifest file. · ae36e509

由 Dhruba Borthakur 提交于 9月 24, 2012

Summary:
The GetLiveFiles() api lists the set of sst files and the current
MANIFEST file. But the database continues to append new data to the
MANIFEST file even when the application is backing it up to the
backup location. This means that the database-version that is
stored in the MANIFEST FILE in the backup location
does not correspond to the sst files returned by GetLiveFiles.

This API adds a new parameter to GetLiveFiles. This new parmeter
returns the current size of the MANIFEST file.

Test Plan: Unit test attached.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5631

ae36e509

18 9月, 2012 1 次提交

Ability to take a file-lvel snapshot from leveldb. · ba55d77b

由 Dhruba Borthakur 提交于 9月 14, 2012

Summary:
A set of apis that allows an application to backup data from the
leveldb database based on a set of files.

Test Plan: unint test attached. more coming soon.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5439

ba55d77b

17 9月, 2012 1 次提交

remove boost · dcbd6be3

由 heyongqiang 提交于 9月 16, 2012

Summary: as subject

Test Plan: build

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D5469

dcbd6be3

07 9月, 2012 1 次提交

put log in a seperate dir · 0f43aa47

由 heyongqiang 提交于 9月 05, 2012

Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5205

0f43aa47

29 8月, 2012 2 次提交

Do not spin in a tight loop attempting compactions if there is a compaction error · 6fee5a74

由 heyongqiang 提交于 8月 22, 2012

Summary: as subject. ported the change from google code leveldb 1.5

Test Plan: run db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4839

6fee5a74

fix db_test error with scribe logger turned on · d3759ca1

由 heyongqiang 提交于 8月 27, 2012

Summary: as subject

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4929

d3759ca1

22 8月, 2012 2 次提交

Fixed unit test c_test by initializing logger=NULL. · a098207c

由 Dhruba Borthakur 提交于 8月 21, 2012

Summary:
Fixed unit test c_test by initializing logger=NULL.

Removed "atomic" from last_log_ts so that unit tests do not require C11 compiler.
Anyway, last_log_ts is mostly used for logging, so it is ok if it is loosely
accurate.

Test Plan: run c_test

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4803

a098207c

adding a scribe logger in leveldb to log leveldb deploy stats · 6ba1f177

由 heyongqiang 提交于 8月 14, 2012

Summary:
as subject.

A new log is written to scribe via thrift client when a new db is opened and when there is
a compaction.

a new option var scribe_log_db_stats is added.

Test Plan: manually checked using command "ptail -time 0 leveldb_deploy_stats"

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4659

6ba1f177

18 8月, 2012 1 次提交

use ts as suffix for LOG.old files · 20ee76bd

由 heyongqiang 提交于 8月 17, 2012

Summary: as subject and only maintain 10 log files.

Test Plan: new test in db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4731

20ee76bd

07 7月, 2012 1 次提交

add flush interface to DB · 22ee777f

由 heyongqiang 提交于 7月 06, 2012

Summary: as subject. The flush will flush everything in the db.

Test Plan: new test in db_test.cc

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4029

22ee777f

28 6月, 2012 1 次提交

Make some variables configurable for each db instance · 4e4b6812

由 heyongqiang 提交于 6月 22, 2012

Summary:
Make configurable 'targetFileSize', 'targetFileSizeMultiplier',
'maxBytesForLevelBase', 'maxBytesForLevelMultiplier',
'expandedCompactionFactor', 'maxGrandParentOverlapFactor'

Test Plan: N/A

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3801

4e4b6812

17 4月, 2012 1 次提交

Added bloom filter support. · 85584d49

由 Sanjay Ghemawat 提交于 4月 17, 2012

In particular, we add a new FilterPolicy class.  An instance
of this class can be supplied in Options when opening a
database.  If supplied, the instance is used to generate
summaries of keys (e.g., a bloom filter) which are placed in
sstables.  These summaries are consulted by DB::Get() so we
can avoid reading sstable blocks that are guaranteed to not
contain the key we are looking for.

This change provides one implementation of FilterPolicy
based on bloom filters.

Other changes:
- Updated version number to 1.4.
- Some build tweaks.
- C binding for CompactRange.
- A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom.
- Minor .gitignore update.

85584d49

09 3月, 2012 1 次提交
- S
  
  added group commit; drastically speeds up mult-threaded synchronous write workloads · d79762e2
  由 Sanjay Ghemawat 提交于 3月 08, 2012
  
  d79762e2
01 11月, 2011 1 次提交

A number of fixes: · 36a5f8ed

由 Hans Wennborg 提交于 10月 31, 2011

- Replace raw slice comparison with a call to user comparator.
  Added test for custom comparators.

- Fix end of namespace comments.

- Fixed bug in picking inputs for a level-0 compaction.

  When finding overlapping files, the covered range may expand
  as files are added to the input set.  We now correctly expand
  the range when this happens instead of continuing to use the
  old range.  For example, suppose L0 contains files with the
  following ranges:

      F1: a .. d
      F2:    c .. g
      F3:       f .. j

  and the initial compaction target is F3.  We used to search
  for range f..j which yielded {F2,F3}.  However we now expand
  the range as soon as another file is added.  In this case,
  when F2 is added, we expand the range to c..j and restart the
  search.  That picks up file F1 as well.

  This change fixes a bug related to deleted keys showing up
  incorrectly after a compaction as described in Issue 44.

(Sync with upstream @25072954)

36a5f8ed

06 10月, 2011 1 次提交

A number of bugfixes: · 299ccedf

由 Gabor Cselle 提交于 10月 05, 2011

- Added DB::CompactRange() method.

  Changed manual compaction code so it breaks up compactions of
  big ranges into smaller compactions.

  Changed the code that pushes the output of memtable compactions
  to higher levels to obey the grandparent constraint: i.e., we
  must never have a single file in level L that overlaps too
  much data in level L+1 (to avoid very expensive L-1 compactions).

  Added code to pretty-print internal keys.

- Fixed bug where we would not detect overlap with files in
  level-0 because we were incorrectly using binary search
  on an array of files with overlapping ranges.

  Added "leveldb.sstables" property that can be used to dump
  all of the sstables and ranges that make up the db state.

- Removing post_write_snapshot support.  Email to leveldb mailing
  list brought up no users, just confusion from one person about
  what it meant.

- Fixing static_cast char to unsigned on BIG_ENDIAN platforms.

  Fixes	Issue 35 and Issue 36.

- Comment clarification to address leveldb Issue 37.

- Change license in posix_logger.h to match other files.

- A build problem where uint32 was used instead of uint32_t.

Sync with upstream @24408625

299ccedf

02 9月, 2011 1 次提交

Bugfixes: for Get(), don't hold mutex while writing log. · 72630236

由 gabor@google.com 提交于 9月 01, 2011

- Fix bug in Get: when it triggers a compaction, it could sometimes
  mark the compaction with the wrong level (if there was a gap
  in the set of levels examined for the Get).

- Do not hold mutex while writing to the log file or to the
  MANIFEST file.

  Added a new benchmark that runs a writer thread concurrently with
  reader threads.

  Percentiles
  ------------------------------
  micros/op: avg  median 99   99.9  99.99  99.999 max
  ------------------------------------------------------
  before:    42   38     110  225   32000  42000  48000
  after:     24   20     55   65    130    1100   7000

- Fixed race in optimized Get.  It should have been using the
  pinned memtables, not the current memtables.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@50 62dab493-f737-651d-591e-8d6aee1b9529

72630236

kvdb / rocksdb 11 个月 前同步成功

kvdb / rocksdb
11 个月前同步成功