提交 · 9f690ec62c16592dc6bc80ddf9d227632db51441 · kvdb / rocksdb

08 1月, 2014 1 次提交

Fix a deadlock in CompactRange() · 9f690ec6

由 Tomislav Novak 提交于 12月 21, 2013

Summary:
The way DBImpl::TEST_CompactRange() throttles down the number of bg compactions
can cause it to deadlock when CompactRange() is called concurrently from
multiple threads. Imagine a following scenario with only two threads
(max_background_compactions is 10 and bg_compaction_scheduled_ is initially 0):

   1. Thread #1 increments bg_compaction_scheduled_ (to LargeNumber), sets
      bg_compaction_scheduled_ to 9 (newvalue), schedules the compaction
      (bg_compaction_scheduled_ is now 10) and waits for it to complete.
   2. Thread #2 calls TEST_CompactRange(), increments bg_compaction_scheduled_
      (now LargeNumber + 10) and waits on a cv for bg_compaction_scheduled_ to
      drop to LargeNumber.
   3. BG thread completes the first manual compaction, decrements
      bg_compaction_scheduled_ and wakes up all threads waiting on bg_cv_.
      Thread #1 runs, increments bg_compaction_scheduled_ by LargeNumber again
      (now 2*LargeNumber + 9). Since that's more than LargeNumber + newvalue,
      thread #2 also goes to sleep (waiting on bg_cv_), without resetting
      bg_compaction_scheduled_.

This diff attempts to address the problem by introducing a new counter
bg_manual_only_ (when positive, MaybeScheduleFlushOrCompaction() will only
schedule manual compactions).

Test Plan:
I could pretty much consistently reproduce the deadlock with a program that
calls CompactRange(nullptr, nullptr) immediately after Write() from multiple
threads. This no longer happens with this patch.

Tests (make check) pass.

Reviewers: dhruba, igor, sdong, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14799

9f690ec6

07 1月, 2014 3 次提交
- K
  
  Revert change in 8f6e3195 . · c370f559
  由 kailiu 提交于 1月 06, 2014
  
  c370f559
- K
  Merge pull request #56 from sepeth/refactor-detect-version · be271c33
  由 Kai Liu 提交于 1月 06, 2014
```
Refactor build_tools/build_detect_version
```
  be271c33
- K
  
  Fix issue #57 · 7e70ff63
  由 kailiu 提交于 1月 06, 2014
  
  7e70ff63
06 1月, 2014 1 次提交
- D
  
  Refactor build_tools/build_detect_version · d800dc56
  由 Doğan Çeçen 提交于 1月 05, 2014
  
  d800dc56
05 1月, 2014 1 次提交
- K
  
  Add a hack to build_detect_platform so it works in all types of fb-servers · 8f6e3195
  由 Kai Liu 提交于 1月 04, 2014
  
  8f6e3195
03 1月, 2014 3 次提交

Add clang-format rules · 463086bc

由 Kai Liu 提交于 1月 02, 2014

Summary:
The rule file is forked from that in Facebook's repo.

I'll add format file for now and team members can tune the rules later.

In this patch, I made only two changes in order to be consistent with existing coding style

`SpacesBeforeTrailingComments: 2`

`ColumnLimit:     80`

Test Plan: N/A

Reviewers: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15015

463086bc

Automate the preparation step for a new release · 46950597

由 Kai Liu 提交于 12月 30, 2013

Summary: Added a script that prepares the repo for facebook's new rocksdb release, which will automatically do some necessary work to make sure this repo is ready for 3rdparty release.

Test Plan:
Run this script and observed:

* new version was created (both in local and remote repo) as a git tag.
* build_version.cc was updated
* build_detect_platform was changed so that it won't create any new change.

Reviewers: haobo, dhruba, sdong, igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15003

46950597

K
Hotfix the bug in table cache's GetSliceForFileNumber · 9281a826
由 kailiu 提交于 1月 02, 2014
```
Forgot to fix this problem in master branch. Already fixed it in performance branch.
```
9281a826

02 1月, 2014 4 次提交

Support multi-threaded DisableFileDeletions() and EnableFileDeletions() · b60c14f6

由 Igor Canadi 提交于 1月 02, 2014

Summary:
We don't want two threads to clash if they concurrently call DisableFileDeletions() and EnableFileDeletions(). I'm adding a counter that will enable file deletions only after all DisableFileDeletions() calls have been negated with EnableFileDeletions().

However, we also don't want to break the old behavior, so I added a parameter force to EnableFileDeletions(). If force is true, we will still enable file deletions after every call to EnableFileDeletions(), which is what is happening now.

Test Plan: make check

Reviewers: dhruba, haobo, sanketh

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14781

b60c14f6

I

moving autovector_test after db_test · 345fb94d
由 Igor Canadi 提交于 1月 02, 2014

345fb94d
I

Add -DROCKSDB_FALLOCATE_PRESENT to fbcode build · 52ea1be9
由 Igor Canadi 提交于 1月 02, 2014

52ea1be9
I
Merge pull request #48 from dyu/master · 2b3aab3e
由 Igor Canadi 提交于 1月 02, 2014
```
fix build bug
```
2b3aab3e

31 12月, 2013 3 次提交
- K
  
  update the latest version in README.fb to 2.7 · fe030bd1
  由 Kai Liu 提交于 12月 30, 2013
  
  fe030bd1
- K
  
  Simplify build_tools/build_detect_version · 5a20744a
  由 Kai Liu 提交于 12月 30, 2013
  
  5a20744a
- K
  Update README.fb · 1795397b
  由 Kai Liu 提交于 12月 30, 2013
```
Update the latest version number.
```
  1795397b
30 12月, 2013 2 次提交
- D
  
  docs for shared library builds · e842b99f
  由 dyu 提交于 12月 30, 2013
  
  e842b99f
- D
  
  tweak build bug fix · a6b476a2
  由 dyu 提交于 12月 30, 2013
  
  a6b476a2
27 12月, 2013 8 次提交

fix build bug from recent... · 9d4dc0da

由 dyu 提交于 12月 27, 2013

fix build bug from recent commit:https://github.com/facebook/rocksdb/commit/43c386b72ee834c88a1a22500ce1fc36a8208277

9d4dc0da

TableCache.FindTable() to avoid the mem copy of file number · a094f3b3

由 Siying Dong 提交于 12月 26, 2013

Summary: I'm not sure what's the purpose of encoding file number to a new buffer for looking up the table cache. It seems to be unnecessary to me. With this patch, we point the lookup key to the address of the int64 of the file number.

Test Plan: make all check

Reviewers: dhruba, haobo, igor, kailiu

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14811

a094f3b3

Avoid malloc in NotFound key status if no message is given. · 18df47b7

由 Siying Dong 提交于 12月 26, 2013

Summary:
In some places we have NotFound status created with empty message, but it doesn't avoid a malloc. With this patch, the malloc is avoided for that case.

The motivation of it is that I found in db_bench readrandom test when all keys are not existing, about 4% of the total running time is spent on malloc of Status, plus a similar amount of CPU spent on free of them, which is not necessary.

Test Plan: make all check

Reviewers: dhruba, haobo, igor

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14691

18df47b7

K

Fix all the comparison issue in fb dev servers · b40c052b
由 Kai Liu 提交于 12月 26, 2013

b40c052b
K

Fix [-Werror=sign-compare] in autovector_test · 113a08c9
由 kailiu 提交于 12月 26, 2013

113a08c9
K

Fix the unused variable warning message in mac os · 079a21ba
由 kailiu 提交于 12月 26, 2013

079a21ba

Implement autovector · c01676e4

由 kailiu 提交于 12月 12, 2013

Summary:
A vector that leverages pre-allocated stack-based array to achieve better
performance for array with small amount of items.

Test Plan:
Added tests for both correctness and performance

Here is the performance benchmark between vector and autovector

Please note that in the test "Creation and Insertion Test", the test case were designed with the motivation described below:

* no element inserted: internal array of std::vector may not really get
  initialize.
* one element inserted: internal array of std::vector must have
  initialized.
* kSize elements inserted. This shows the most time we'll spend if we
  keep everything in stack.
* 2 * kSize elements inserted. The internal vector of
  autovector must have been initialized.

Note: kSize is the capacity of autovector

  =====================================================
  Creation and Insertion Test
  =====================================================
  created 100000 vectors:
  	each was inserted with 0 elements
  	total time elapsed: 128000 (ns)
  created 100000 autovectors:
  	each was inserted with 0 elements
  	total time elapsed: 3641000 (ns)
  created 100000 VectorWithReserveSizes:
  	each was inserted with 0 elements
  	total time elapsed: 9896000 (ns)
  -----------------------------------
  created 100000 vectors:
  	each was inserted with 1 elements
  	total time elapsed: 11089000 (ns)
  created 100000 autovectors:
  	each was inserted with 1 elements
  	total time elapsed: 5008000 (ns)
  created 100000 VectorWithReserveSizes:
  	each was inserted with 1 elements
  	total time elapsed: 24271000 (ns)
  -----------------------------------
  created 100000 vectors:
  	each was inserted with 4 elements
  	total time elapsed: 39369000 (ns)
  created 100000 autovectors:
  	each was inserted with 4 elements
  	total time elapsed: 10121000 (ns)
  created 100000 VectorWithReserveSizes:
  	each was inserted with 4 elements
  	total time elapsed: 28473000 (ns)
  -----------------------------------
  created 100000 vectors:
  	each was inserted with 8 elements
  	total time elapsed: 75013000 (ns)
  created 100000 autovectors:
  	each was inserted with 8 elements
  	total time elapsed: 18237000 (ns)
  created 100000 VectorWithReserveSizes:
  	each was inserted with 8 elements
  	total time elapsed: 42464000 (ns)
  -----------------------------------
  created 100000 vectors:
  	each was inserted with 16 elements
  	total time elapsed: 102319000 (ns)
  created 100000 autovectors:
  	each was inserted with 16 elements
  	total time elapsed: 76724000 (ns)
  created 100000 VectorWithReserveSizes:
  	each was inserted with 16 elements
  	total time elapsed: 68285000 (ns)
  -----------------------------------
  =====================================================
  Sequence Access Test
  =====================================================
  performed 100000 sequence access against vector
  	size: 4
  	total time elapsed: 198000 (ns)
  performed 100000 sequence access against autovector
  	size: 4
  	total time elapsed: 306000 (ns)
  -----------------------------------
  performed 100000 sequence access against vector
  	size: 8
  	total time elapsed: 565000 (ns)
  performed 100000 sequence access against autovector
  	size: 8
  	total time elapsed: 512000 (ns)
  -----------------------------------
  performed 100000 sequence access against vector
  	size: 16
  	total time elapsed: 1076000 (ns)
  performed 100000 sequence access against autovector
  	size: 16
  	total time elapsed: 1070000 (ns)
  -----------------------------------

Reviewers: dhruba, haobo, sdong, chip

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14655

c01676e4

K
Merge pull request #32 from jamesgolick/master · 5643ae1a
由 Kai Liu 提交于 12月 26, 2013
```
Only try to use fallocate if it's actually present on the system.
```
5643ae1a

24 12月, 2013 1 次提交

Add a pointer to the engineering design discussion forum. · 71ddb117

由 Dhruba Borthakur 提交于 12月 23, 2013

Summary:
Add a pointer to the engineering design discussion forum.

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:

71ddb117

21 12月, 2013 2 次提交

I

Initialize sequence number in BatchResult - issue #39 · b26dc956
由 Igor Canadi 提交于 12月 20, 2013

b26dc956

[RocksDB] Optimize locking for Get · 1fdb3f7d

由 Igor Canadi 提交于 12月 20, 2013

Summary:
Instead of locking and saving a DB state, we can cache a DB state and update it only when it changes. This change reduces lock contention and speeds up read operations on the DB.

Performance improvements are substantial, although there is some cost in no-read workloads. I ran the regression tests on my devserver and here are the numbers:

  overwrite                    56345  ->   63001
  fillseq                      193730 ->  185296
  readrandom                   771301 -> 1219803 (58% improvement!)
  readrandom_smallblockcache   677609 ->  862850
  readrandom_memtable_sst      710440 -> 1109223
  readrandom_fillunique_random 221589 ->  247869
  memtablefillrandom           105286 ->   92643
  memtablereadrandom           763033 -> 1288862

Test Plan:
make asan_check
I am also running db_stress

Reviewers: dhruba, haobo, sdong, kailiu

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14679

1fdb3f7d

20 12月, 2013 1 次提交
- I
  Merge pull request #28 from bartman/master · 540a2894
  由 Igor Canadi 提交于 12月 19, 2013
```
fix missing gflags library
```
  540a2894
19 12月, 2013 4 次提交

Add 'readtocache' test · ca92068b

由 Mark Callaghan 提交于 12月 18, 2013

Summary:
For some tests I want to cache the database prior to running other tests on the same invocation
of db_bench. The readtocache test ignores --threads and --reads so those can be used by other tests
and it will still do a full read of --num rows with one thread. It might be invoked like:
  db_bench --benchmarks=readtocache,readrandom --reads 100 --num 10000 --threads 8

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14739

ca92068b

Reorder tests · e914b649

由 Igor Canadi 提交于 12月 18, 2013

Summary:
db_test should be the first to execute because it finds the most bugs.

Also, when third parties report issues, we don't want ldb error message, we prefer to have db_test error message. For example, see thread: https://github.com/facebook/rocksdb/issues/25

Test Plan: make check

Reviewers: dhruba, haobo, kailiu

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14715

e914b649

I
Merge pull request #35 from zizkovrb/rm-ds_store · cbb8da6f
由 Igor Canadi 提交于 12月 18, 2013
```
Remove utilities/.DS_Store file.
```
cbb8da6f
I
Merge pull request #37 from mlin/more-c-bindings · 3b50b621
由 Igor Canadi 提交于 12月 18, 2013
```
C bindings: add a bunch of the newer options
```
3b50b621

18 12月, 2013 2 次提交

Move level0 sorting logic from Version::SaveTo() to Version::Finalize() · 14995a8f

由 Siying Dong 提交于 12月 11, 2013

Summary: I realized that "D14409 Avoid sorting in Version::Get() by presorting them in VersionSet::Builder::SaveTo()" is not done in an optimized place. SaveTo() is usually inside mutex. Move it to Finalize(), which is called out of mutex.

Test Plan: make all check

Reviewers: dhruba, haobo, kailiu

Reviewed By: dhruba

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D14607

14995a8f

Get() Does Not Reserve space for to_delete memtables · a8b8b11d

由 Siying Dong 提交于 12月 17, 2013

Summary: It seems to be a decision tradeoff in current codes: we make a malloc for every Get() to reduce one malloc for a flush inside mutex. It takes about 5% of CPU time in readrandom tests. We might consider the tradeoff to be the other way around.

Test Plan: make all check

Reviewers: dhruba, haobo, igor

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14697

a8b8b11d

17 12月, 2013 1 次提交
- J
  
  Remove .DS_Store files. · 8c34189f
  由 Josef Šimánek 提交于 12月 14, 2013
  
  8c34189f
16 12月, 2013 1 次提交
- M
  
  C bindings: add a bunch of the newer options · 2a2506b6
  由 Mike Lin 提交于 12月 13, 2013
  
  2a2506b6
13 12月, 2013 2 次提交

[backupable db] Delete db_dir children when restoring backup · 417b453f

由 Igor Canadi 提交于 12月 12, 2013

Summary:
I realized that manifest will get deleted by PurgeObsoleteFiles in DBImpl, but it is sill cleaner to delete
files before we restore the backup

Test Plan: backupable_db_test

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14619

417b453f

Add monitoring for universal compaction and add counters for compaction IO · e9e6b00d

由 Mark Callaghan 提交于 12月 09, 2013

Summary:
Adds these counters
{ WAL_FILE_SYNCED, "rocksdb.wal.synced" }
  number of writes that request a WAL sync
{ WAL_FILE_BYTES, "rocksdb.wal.bytes" },
  number of bytes written to the WAL
{ WRITE_DONE_BY_SELF, "rocksdb.write.self" },
  number of writes processed by the calling thread
{ WRITE_DONE_BY_OTHER, "rocksdb.write.other" },
  number of writes not processed by the calling thread. Instead these were
  processed by the current holder of the write lock
{ WRITE_WITH_WAL, "rocksdb.write.wal" },
  number of writes that request WAL logging
{ COMPACT_READ_BYTES, "rocksdb.compact.read.bytes" },
  number of bytes read during compaction
{ COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes" },
  number of bytes written during compaction

Per-interval stats output was updated with WAL stats and correct stats for universal compaction
including a correct value for write-amplification. It now looks like:
                               Compactions
Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall Stall-cnt
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0        7      464  46.4       281      3411      3875      3411         0      3875        2.1      12.1        13.8      621        0      240      240      628       0.0         0
Uptime(secs): 310.8 total, 2.0 interval
Writes cumulative: 9999999 total, 9999999 batches, 1.0 per batch, 1.22 ingest GB
WAL cumulative: 9999999 WAL writes, 9999999 WAL syncs, 1.00 writes per sync, 1.22 GB written
Compaction IO cumulative (GB): 1.22 new, 3.33 read, 3.78 write, 7.12 read+write
Compaction IO cumulative (MB/sec): 4.0 new, 11.0 read, 12.5 write, 23.4 read+write
Amplification cumulative: 4.1 write, 6.8 compaction
Writes interval: 100000 total, 100000 batches, 1.0 per batch, 12.5 ingest MB
WAL interval: 100000 WAL writes, 100000 WAL syncs, 1.00 writes per sync, 0.01 MB written
Compaction IO interval (MB): 12.49 new, 14.98 read, 21.50 write, 36.48 read+write
Compaction IO interval (MB/sec): 6.4 new, 7.6 read, 11.0 write, 18.6 read+write
Amplification interval: 101.7 write, 102.9 compaction
Stalls(secs): 142.924 level0_slowdown, 0.000 level0_numfiles, 0.805 memtable_compaction, 0.000 leveln_slowdown
Stalls(count): 132461 level0_slowdown, 0 level0_numfiles, 3 memtable_compaction, 0 leveln_slowdown

Task ID: #3329644, #3301695

Blame Rev:

Test Plan:
Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14583

e9e6b00d

kvdb / rocksdb 11 个月 前同步成功

kvdb / rocksdb
11 个月前同步成功