1. 05 6月, 2015 2 次提交
    • I
      Allowing L0 -> L1 trivial move on sorted data · 3ce3bb3d
      Islam AbdelRahman 提交于
      Summary:
      This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping
      
      The conditions for trivial move have been updated
      
      Introduced conditions:
        - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual)
        - Input level files cannot be overlapping
      
      Removed conditions:
        - Trivial move only run when the compaction is not manual
        - Input level should can contain only 1 file
      
      More context on what tests failed because of Trivial move
      ```
      DBTest.CompactionsGenerateMultipleFiles
      This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1
      ```
      
      ```
      DBTest.NoSpaceCompactRange
      This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation
      because trivial move does not need any extra space, and did not check for that
      ```
      
      ```
      DBTest.DropWrites
      Similar to DBTest.NoSpaceCompactRange
      ```
      
      ```
      DBTest.DeleteObsoleteFilesPendingOutputs
      This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3
      ```
      
      ```
      CuckooTableDBTest.CompactionIntoMultipleFiles
      Same as DBTest.CompactionsGenerateMultipleFiles
      ```
      
      This diff is based on a work by @sdong https://reviews.facebook.net/D34149
      
      Test Plan: make -j64 check
      
      Reviewers: rven, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yhchiang, ott, march, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D34797
      3ce3bb3d
    • Y
      Changed the CompactionJobStats::output_key_prefix type from char[] to string. · bb808ead
      Yueh-Hsuan Chiang 提交于
      Summary:
      Keys in RocksDB can be arbitrary byte strings.  However, in the current
      CompactionJobStats, smallest_output_key_prefix and largest_output_key_prefix
      are of type char[] without having a length, which is insufficient to handle
      non-null terminated strings.
      
      This patch change their type to std::string.
      
      Test Plan: compaction_job_stats_test
      
      Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39537
      bb808ead
  2. 04 6月, 2015 4 次提交
  3. 03 6月, 2015 7 次提交
  4. 02 6月, 2015 3 次提交
    • M
      more times in perf_context and iostats_context · ec7a9443
      Mike Kolupaev 提交于
      Summary:
      We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls:
       - perf_context.db_mutex_lock_nanos
       - perf_context.db_condition_wait_nanos
       - iostats_context.open_time
       - iostats_context.allocate_time
       - iostats_context.write_time
       - iostats_context.range_sync_time
       - iostats_context.logger_time
      
      In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long.
      
      Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took.
      
      Reviewers: rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: march, dhruba, tnovak
      
      Differential Revision: https://reviews.facebook.net/D39177
      ec7a9443
    • S
      Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange() · 4266d4fd
      sdong 提交于
      Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true.
      
      Test Plan: Add a unit test for it.
      
      Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39099
      4266d4fd
    • Y
      Removed DBImpl::notifying_events_ · d333820b
      Yueh-Hsuan Chiang 提交于
      Summary:
      DBImpl::notifying_events_ is a internal counter in DBImpl which is
      used to prevent DB close when DB is notifying events.  However, as
      the current events all rely on either compaction or flush which
      already have similar counters to prevent DB close, it is safe to
      remove notifying_events_.
      
      Test Plan:
      listener_test
      examples/compact_files_example
      
      Reviewers: igor, anthony, kradhakrishnan, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39315
      d333820b
  5. 31 5月, 2015 1 次提交
  6. 30 5月, 2015 3 次提交
    • A
      fix LITE build · bc7a7a40
      agiardullo 提交于
      Summary: Broken by optimistic transaction diff.  (I only built 'release' not 'static_lib' when testing).
      
      Test Plan: build
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39219
      bc7a7a40
    • A
      Optimistic Transactions · dc9d70de
      agiardullo 提交于
      Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.
      
      Test Plan: Added a new test, but still need to look into stress testing.
      
      Reviewers: yhchiang, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D33435
      dc9d70de
    • R
      C: add support for WriteBatch SliceParts params · 21cd6b7a
      Reed Allman 提交于
      21cd6b7a
  7. 29 5月, 2015 4 次提交
    • R
      WriteBatch.Merge w/ SliceParts support · a0635ba3
      Reed Allman 提交于
      also hooked up WriteBatchInternal
      a0635ba3
    • A
      Support saving history in memtable_list · c8153510
      agiardullo 提交于
      Summary:
      For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
      
      After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
      
      This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
      
      However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
      
      Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37443
      c8153510
    • Y
      Rename EventLoggerHelpers EventHelpers · ec4ff4e9
      Yueh-Hsuan Chiang 提交于
      Summary:
      Rename EventLoggerHelpers EventHelpers, as it's going to include
      all event-related helper functions instead of EventLogger only stuffs.
      
      Test Plan: make
      
      Reviewers: sdong, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39093
      ec4ff4e9
    • Y
      [API Change] Move listeners from ColumnFamilyOptions to DBOptions · 672dda9b
      Yueh-Hsuan Chiang 提交于
      Summary: Move listeners from ColumnFamilyOptions to DBOptions
      
      Test Plan:
      listener_test
      compact_files_test
      
      Reviewers: rven, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39087
      672dda9b
  8. 27 5月, 2015 3 次提交
  9. 23 5月, 2015 6 次提交
  10. 22 5月, 2015 3 次提交
    • Y
      Allow EventLogger to directly log from a JSONWriter. · 7fee8775
      Yueh-Hsuan Chiang 提交于
      Summary:
      Allow EventLogger to directly log from a JSONWriter.  This allows
      the JSONWriter to be shared by EventLogger and potentially EventListener,
      which is an important step to integrate EventLogger and EventListener.
      
      This patch also rewrites EventLoggerHelpers::LogTableFileCreation(),
      which uses the new API to generate identical log.
      
      Test Plan:
      Run db_bench in debug mode and make sure the log is correct and no
      assertions fail.
      
      Reviewers: sdong, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38709
      7fee8775
    • I
      Don't artificially inflate L0 score · 7a357751
      Igor Canadi 提交于
      Summary:
      This turns out to be pretty bad because if we prioritize L0->L1 then L1 can grow artificially large, which makes L0->L1 more and more expensive. For example:
      256MB @ L0 + 256MB @ L1 --> 512MB @ L1
      256MB @ L0 + 512MB @ L1 --> 768MB @ L1
      256MB @ L0 + 768MB @ L1 --> 1GB @ L1
      
      ....
      
      256MB @ L0 + 10GB @ L1 --> 10.2GB @ L1
      
      At some point we need to start compacting L1->L2 to speed up L0->L1.
      
      Test Plan:
      The performance improvement is massive for heavy write workload. This is the benchmark I ran: https://phabricator.fb.com/P19842671. Before this change, the benchmark took 47 minutes to complete. After, the benchmark finished in 2minutes. You can see full results here: https://phabricator.fb.com/P19842674
      
      Also, we ran this diff on MongoDB on RocksDB on one replicaset. Before the change, our initial sync was so slow that it couldn't keep up with primary writes. After the change, the import finished without any issues
      
      Reviewers: dynamike, MarkCallaghan, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38637
      7a357751
    • Y
      [Public API Change] Make DB::GetDbIdentity() be const function. · e2c1d4b5
      Yueh-Hsuan Chiang 提交于
      Summary: Make DB::GetDbIdentity() be const function.
      
      Test Plan: make db_test
      
      Reviewers: igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38745
      e2c1d4b5
  11. 20 5月, 2015 3 次提交
    • Y
      Dump db stats in WARN level · 812c461c
      Yueh-Hsuan Chiang 提交于
      Summary: Dump db stats in WARN level
      
      Test Plan: run db_bench and verify the LOG
      
      Reviewers: igor, MarkCallaghan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38691
      812c461c
    • M
      Add --wal_bytes_per_sync for db_bench and more IO stats · 944043d6
      Mark Callaghan 提交于
      Summary:
      See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats.
      I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates
      can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed
      using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job
      (IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes.
      The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the
      Interval compaction line uses the interval uptime. So these Cumalative & Interval
      compaction IO rates cannot be compared to the per-level IO rates. But both forms of
      the rates are useful for debugging perf.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D38667
      944043d6
    • I
      Fix comparison between signed and usigned integers · 04feaeeb
      Igor Canadi 提交于
      Summary: Not sure why this fails on some compilers and doesn't on others.
      
      Test Plan: none
      
      Reviewers: meyering, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38673
      04feaeeb
  12. 19 5月, 2015 1 次提交