1. 12 6月, 2015 3 次提交
    • S
      Slow down writes by bytes written · 7842920b
      sdong 提交于
      Summary:
      We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch.
      
      The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work
      
      hard_rate_limit is deprecated.
      
      options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up.
      
      Test Plan: Add new unit tests in db_test
      
      Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor
      
      Reviewed By: igor
      
      Subscribers: ikabiljo, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D36351
      7842920b
    • I
      Don't let two L0->L1 compactions run in parallel · a84df655
      Igor Canadi 提交于
      Summary: With experimental feature SuggestCompactRange() we don't restrict running two L0->L1 compactions in parallel. This diff fixes this.
      
      Test Plan: added a unit test to reproduce the failure. fixed the unit test
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39981
      a84df655
    • I
      Handling edge cases for ReFitLevel · 73faa3d4
      Islam AbdelRahman 提交于
      Summary:
      Right now the level we pass to ReFitLevel is the maximum level with files (before compaction), there are multiple cases where this maximum level have changed after compaction
      - all files where in L0 (now maximum level is L1)
      - using kCompactionStyleUniversal (now maximum level in the last level)
      - level_compaction_dynamic_level_bytes ??
      
      We can handle each of these cases individually, but I felt it's safer to calculate max_level_with_files again if we want to do a ReFitLevel
      
      Test Plan:
      adding some tests
      make -j64 check
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: ott, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39663
      73faa3d4
  2. 10 6月, 2015 2 次提交
    • S
      Make "make all" work for CYGWIN · e409d3d7
      sdong 提交于
      Summary: Some test and benchmark codes don't build for CYGWIN. Fix it.
      
      Test Plan: Build "make all" with TARGET_OS=Cygwin on cygwin and make sure it passes.
      
      Reviewers: rven, yhchiang, anthony, igor, kradhakrishnan
      
      Reviewed By: igor, kradhakrishnan
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39711
      e409d3d7
    • V
      Fix hang when closing a DB after doing loads with WAL disabled. · 406a5682
      Venkatesh Radhakrishnan 提交于
      Summary:
      There is a hang during DB close in the following scenario:
      a) a load with WAL disabled was done,
      b) CancelAllBackgroundWork was called,
      c) DB Close was called
      This was because in that we will wait for a flush but we cannot do a
      background flush because we have called CancelAllBackgroundWork which
      marks the DB as shutting downn.
      
      Test Plan: Added DBTest FlushOnDestroy
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, hermanlee4, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39747
      406a5682
  3. 06 6月, 2015 3 次提交
  4. 05 6月, 2015 1 次提交
    • I
      Allowing L0 -> L1 trivial move on sorted data · 3ce3bb3d
      Islam AbdelRahman 提交于
      Summary:
      This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping
      
      The conditions for trivial move have been updated
      
      Introduced conditions:
        - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual)
        - Input level files cannot be overlapping
      
      Removed conditions:
        - Trivial move only run when the compaction is not manual
        - Input level should can contain only 1 file
      
      More context on what tests failed because of Trivial move
      ```
      DBTest.CompactionsGenerateMultipleFiles
      This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1
      ```
      
      ```
      DBTest.NoSpaceCompactRange
      This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation
      because trivial move does not need any extra space, and did not check for that
      ```
      
      ```
      DBTest.DropWrites
      Similar to DBTest.NoSpaceCompactRange
      ```
      
      ```
      DBTest.DeleteObsoleteFilesPendingOutputs
      This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3
      ```
      
      ```
      CuckooTableDBTest.CompactionIntoMultipleFiles
      Same as DBTest.CompactionsGenerateMultipleFiles
      ```
      
      This diff is based on a work by @sdong https://reviews.facebook.net/D34149
      
      Test Plan: make -j64 check
      
      Reviewers: rven, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yhchiang, ott, march, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D34797
      3ce3bb3d
  5. 04 6月, 2015 2 次提交
    • Y
      Add EventListener::OnTableFileDeletion() · 0b3172d0
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add EventListener::OnTableFileDeletion(), which will be
      called when a table file is deleted.
      
      Test Plan: Extend three existing tests in db_test to verify the deleted files.
      
      Reviewers: rven, anthony, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38931
      0b3172d0
    • S
      Fix DBTest.MigrateToDynamicLevelMaxBytesBase slowness with valgrind · 3af668ed
      sdong 提交于
      Summary:
      DBTest.MigrateToDynamicLevelMaxBytesBase with valgrind test is
      extremely slow. Work it around by not having both threads running
      everything non-stop.
      
      Test Plan: Run the test with valgrind which used to take too long to finish and see it finish in reasonable time.
      
      Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39477
      3af668ed
  6. 03 6月, 2015 1 次提交
  7. 02 6月, 2015 1 次提交
  8. 30 5月, 2015 1 次提交
    • A
      Optimistic Transactions · dc9d70de
      agiardullo 提交于
      Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.
      
      Test Plan: Added a new test, but still need to look into stress testing.
      
      Reviewers: yhchiang, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D33435
      dc9d70de
  9. 29 5月, 2015 1 次提交
    • A
      Support saving history in memtable_list · c8153510
      agiardullo 提交于
      Summary:
      For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
      
      After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
      
      This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
      
      However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
      
      Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37443
      c8153510
  10. 22 5月, 2015 1 次提交
  11. 19 5月, 2015 3 次提交
  12. 15 5月, 2015 1 次提交
  13. 14 5月, 2015 1 次提交
  14. 12 5月, 2015 1 次提交
    • A
      API to fetch from both a WriteBatchWithIndex and the db · 711465cc
      agiardullo 提交于
      Summary:
      Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch.  (This is needed for transactions).
      
      I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep).
      
      Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc.  WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)).
      
      Test Plan: Added new unit tests.
      
      Reviewers: rven, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38037
      711465cc
  15. 10 5月, 2015 1 次提交
  16. 09 5月, 2015 1 次提交
  17. 07 5月, 2015 1 次提交
    • Y
      Allow GetThreadList() to report basic compaction operation properties. · 77a5a543
      Yueh-Hsuan Chiang 提交于
      Summary:
      Now we're able to show more details about a compaction in
      GetThreadList() :)
      
      This patch allows GetThreadList() to report basic compaction
      operation properties.  Basic compaction properties include:
          1. job id
          2. compaction input / output level
          3. compaction property flags (is_manual, is_deletion, .. etc)
          4. total input bytes
          5. the number of bytes has been read currently.
          6. the number of bytes has been written currently.
      
      Flush operation properties will be done in a seperate diff.
      
      Test Plan:
      /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1
      
      Sample output of tracking same job:
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140664171987072    Low Pri      default           Compaction     31.357 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140664171987072    Low Pri      default           Compaction     59.440 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140664171987072    Low Pri      default           Compaction    226.375 ms                        CompactionJob::Install              BaseInputLevel 1 | BytesRead 3958013 | BytesWritten 3621940 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37653
      77a5a543
  18. 02 5月, 2015 1 次提交
  19. 30 4月, 2015 2 次提交
  20. 25 4月, 2015 1 次提交
  21. 24 4月, 2015 3 次提交
    • S
      Fix CompactRange for universal compaction with num_levels > 1 · d01bbb53
      sdong 提交于
      Summary:
      CompactRange for universal compaction with num_levels > 1 seems to have a bug. The unit test also has a bug so it doesn't capture the problem.
      Fix it. Revert the compact range to the logic equivalent to num_levels=1. Always compact all files together.
      
      It should also fix DBTest.IncreaseUniversalCompactionNumLevels. The issue was that options.write_buffer_size = 100 << 10 and options.write_buffer_size = 100 << 10 are not used in later test scenarios. So write_buffer_size of 4MB was used. The compaction trigger condition is not anymore obvious as expected.
      
      Test Plan: Run the new test and all test suites
      
      Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D37551
      d01bbb53
    • G
      Implement DB::PromoteL0 method · 2dc421df
      Giuseppe Ottaviano 提交于
      Summary:
      This diff implements a new `DB` method `PromoteL0` which moves all files in L0
      to a given level skipping compaction, provided that the files have disjoint
      ranges and all levels up to the target level are empty.
      
      This method provides finer-grain control for trivial compactions, and it is
      useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change
      the semantics of an existing operation, which can impact existing code.
      
      PromoteL0 is designed to work well in combination with the proposed
      `GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure
      by populating one level at a time. Such fine-grained control can be very useful
      for static or mostly-static databases.
      
      Test Plan: `make check`
      
      Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D37107
      2dc421df
    • S
      options.paranoid_file_checks to read all rows after writing to a file. · 397b6588
      sdong 提交于
      Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows.
      
      Test Plan: Add a new unit test for it
      
      Reviewers: rven, igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D37335
      397b6588
  22. 23 4月, 2015 1 次提交
    • V
      Making PreShutdown tests more reliable. · 618d07b0
      Venkatesh Radhakrishnan 提交于
      Summary:
      A couple of times on Travis, we have had the thread status say that there were no compactions done and since we assert for it, the test failed.
      We now fix this by waiting till compaction started.
      
      Test Plan:
      run DBTEST::*PreShutdown*
      
      d=/tmp/j; rm -rf $d; seq 200 | parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_test --gtest_filter=DBTest.PreShutdown* >& '$d'/log-{}'
      
      Reviewers: sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D37545
      618d07b0
  23. 18 4月, 2015 1 次提交
    • I
      Add experimental API MarkForCompaction() · 6059bdf8
      Igor Canadi 提交于
      Summary:
      Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).
      
      All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.
      
      MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.
      
      Test Plan: added a new unit test
      
      Reviewers: yhchiang, rven, MarkCallaghan, sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37083
      6059bdf8
  24. 15 4月, 2015 4 次提交
    • S
      Bug of trivial move of dynamic level · debaf85e
      sdong 提交于
      Summary: D36669 introduces a bug that trivial moved data is not going to specific level but the next level, which will incorrectly be level 1 for level 0 compaciton if base level is not level 1. Fixing it by appreciating the output level
      
      Test Plan: Run all tests
      
      Reviewers: MarkCallaghan, rven, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D37119
      debaf85e
    • S
      Fix and Improve DBTest.DynamicLevelCompressionPerLevel2 · 12d7d3d2
      sdong 提交于
      Summary:
      Recent change of DBTest.DynamicLevelCompressionPerLevel2 has a bug that the second sync point is not enabled. Fix it. Also add an assert for that.
      Also, flush compression is not tracked in the test. Add it.
      
      Test Plan: Build everything
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D37101
      12d7d3d2
    • S
      Fix build break introduced by new SyncPoint interface change · a1271c6c
      sdong 提交于
      Summary: When commiting the sync point interface change, didn't resolve the new occurance of the old interface in rebase. Fix it.
      
      Test Plan: Build and see it pass
      
      Reviewers: igor, yhchiang, rven, anthony, kradhakrishnan
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D37095
      a1271c6c
    • S
      SyncPoint to allow a callback with an argument and use it to get... · fcb206b6
      sdong 提交于
      SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward
      
      Summary:
      Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests.
      Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug.
      
      Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check.
      
      Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D36999
      fcb206b6
  25. 14 4月, 2015 2 次提交