1. 02 6月, 2015 2 次提交
    • S
      Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange() · 4266d4fd
      sdong 提交于
      Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true.
      
      Test Plan: Add a unit test for it.
      
      Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39099
      4266d4fd
    • Y
      Removed DBImpl::notifying_events_ · d333820b
      Yueh-Hsuan Chiang 提交于
      Summary:
      DBImpl::notifying_events_ is a internal counter in DBImpl which is
      used to prevent DB close when DB is notifying events.  However, as
      the current events all rely on either compaction or flush which
      already have similar counters to prevent DB close, it is safe to
      remove notifying_events_.
      
      Test Plan:
      listener_test
      examples/compact_files_example
      
      Reviewers: igor, anthony, kradhakrishnan, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39315
      d333820b
  2. 31 5月, 2015 1 次提交
  3. 30 5月, 2015 2 次提交
    • A
      fix LITE build · bc7a7a40
      agiardullo 提交于
      Summary: Broken by optimistic transaction diff.  (I only built 'release' not 'static_lib' when testing).
      
      Test Plan: build
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39219
      bc7a7a40
    • A
      Optimistic Transactions · dc9d70de
      agiardullo 提交于
      Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.
      
      Test Plan: Added a new test, but still need to look into stress testing.
      
      Reviewers: yhchiang, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D33435
      dc9d70de
  4. 29 5月, 2015 4 次提交
    • R
      WriteBatch.Merge w/ SliceParts support · a0635ba3
      Reed Allman 提交于
      also hooked up WriteBatchInternal
      a0635ba3
    • A
      Support saving history in memtable_list · c8153510
      agiardullo 提交于
      Summary:
      For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
      
      After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
      
      This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
      
      However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
      
      Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37443
      c8153510
    • Y
      Rename EventLoggerHelpers EventHelpers · ec4ff4e9
      Yueh-Hsuan Chiang 提交于
      Summary:
      Rename EventLoggerHelpers EventHelpers, as it's going to include
      all event-related helper functions instead of EventLogger only stuffs.
      
      Test Plan: make
      
      Reviewers: sdong, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39093
      ec4ff4e9
    • Y
      [API Change] Move listeners from ColumnFamilyOptions to DBOptions · 672dda9b
      Yueh-Hsuan Chiang 提交于
      Summary: Move listeners from ColumnFamilyOptions to DBOptions
      
      Test Plan:
      listener_test
      compact_files_test
      
      Reviewers: rven, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39087
      672dda9b
  5. 27 5月, 2015 3 次提交
  6. 23 5月, 2015 6 次提交
  7. 22 5月, 2015 3 次提交
    • Y
      Allow EventLogger to directly log from a JSONWriter. · 7fee8775
      Yueh-Hsuan Chiang 提交于
      Summary:
      Allow EventLogger to directly log from a JSONWriter.  This allows
      the JSONWriter to be shared by EventLogger and potentially EventListener,
      which is an important step to integrate EventLogger and EventListener.
      
      This patch also rewrites EventLoggerHelpers::LogTableFileCreation(),
      which uses the new API to generate identical log.
      
      Test Plan:
      Run db_bench in debug mode and make sure the log is correct and no
      assertions fail.
      
      Reviewers: sdong, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38709
      7fee8775
    • I
      Don't artificially inflate L0 score · 7a357751
      Igor Canadi 提交于
      Summary:
      This turns out to be pretty bad because if we prioritize L0->L1 then L1 can grow artificially large, which makes L0->L1 more and more expensive. For example:
      256MB @ L0 + 256MB @ L1 --> 512MB @ L1
      256MB @ L0 + 512MB @ L1 --> 768MB @ L1
      256MB @ L0 + 768MB @ L1 --> 1GB @ L1
      
      ....
      
      256MB @ L0 + 10GB @ L1 --> 10.2GB @ L1
      
      At some point we need to start compacting L1->L2 to speed up L0->L1.
      
      Test Plan:
      The performance improvement is massive for heavy write workload. This is the benchmark I ran: https://phabricator.fb.com/P19842671. Before this change, the benchmark took 47 minutes to complete. After, the benchmark finished in 2minutes. You can see full results here: https://phabricator.fb.com/P19842674
      
      Also, we ran this diff on MongoDB on RocksDB on one replicaset. Before the change, our initial sync was so slow that it couldn't keep up with primary writes. After the change, the import finished without any issues
      
      Reviewers: dynamike, MarkCallaghan, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38637
      7a357751
    • Y
      [Public API Change] Make DB::GetDbIdentity() be const function. · e2c1d4b5
      Yueh-Hsuan Chiang 提交于
      Summary: Make DB::GetDbIdentity() be const function.
      
      Test Plan: make db_test
      
      Reviewers: igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38745
      e2c1d4b5
  8. 20 5月, 2015 3 次提交
    • Y
      Dump db stats in WARN level · 812c461c
      Yueh-Hsuan Chiang 提交于
      Summary: Dump db stats in WARN level
      
      Test Plan: run db_bench and verify the LOG
      
      Reviewers: igor, MarkCallaghan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38691
      812c461c
    • M
      Add --wal_bytes_per_sync for db_bench and more IO stats · 944043d6
      Mark Callaghan 提交于
      Summary:
      See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats.
      I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates
      can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed
      using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job
      (IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes.
      The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the
      Interval compaction line uses the interval uptime. So these Cumalative & Interval
      compaction IO rates cannot be compared to the per-level IO rates. But both forms of
      the rates are useful for debugging perf.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D38667
      944043d6
    • I
      Fix comparison between signed and usigned integers · 04feaeeb
      Igor Canadi 提交于
      Summary: Not sure why this fails on some compilers and doesn't on others.
      
      Test Plan: none
      
      Reviewers: meyering, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38673
      04feaeeb
  9. 19 5月, 2015 4 次提交
  10. 16 5月, 2015 2 次提交
    • Y
      Allow GetThreadList to report Flush properties. · 3f0867c0
      Yueh-Hsuan Chiang 提交于
      Summary:
      Allow GetThreadList to report Flush properties, which includes:
      * job id
      * number of bytes that has been written since flush started.
      * total size of input mem-tables
      
      Test Plan:
      ./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000
      
      Sample output from db_bench which tracks same flush job
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140213879898240   High Pri      default                Flush       5789 us                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 577104 | JobID 8 |
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140213879898240   High Pri      default                Flush     30.634 ms                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 1734865 | JobID 8 |
      
      Reviewers: rven, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38505
      3f0867c0
    • I
      Take a chance on a random file when choosing compaction · 7413306d
      Igor Canadi 提交于
      Summary:
      When trying to compact entire database with SuggestCompactRange(), we'll first try the left-most files. This is pretty bad, because:
      1) the left part of LSM tree will be overly compacted, but right part will not be touched
      2) First compaction will pick up the left-most file. Second compaction will try to pick up next left-most, but this will not be possible, because there's a big chance that second's file range on N+1 level is already being compacted.
      
      I observe both of those problems when running Mongo+RocksDB and trying to compact the DB to clean up tombstones. I'm unable to clean them up :(
      
      This diff adds a bit of randomness into choosing a file. First, it chooses a file at random and tries to compact that one. This should solve both problems specified here.
      
      Test Plan: make check
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38379
      7413306d
  11. 15 5月, 2015 1 次提交
  12. 14 5月, 2015 2 次提交
  13. 13 5月, 2015 4 次提交
    • Y
      Fixed compile error in db/column_family.cc · df1f87a8
      Yueh-Hsuan Chiang 提交于
      Summary:
      Fixed the following compile error in db/column_family.cc
          db/column_family.cc:633:33: error: ‘ASSERT_GT’ was not declared in this scope
          16:14:45    ASSERT_GT(listeners.size(), 0U);
      
      Test Plan: make db_test
      
      Reviewers: igor, sdong, rven
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38367
      df1f87a8
    • Y
      Fixed a bug in EventListener::OnCompactionCompleted(). · 14431e97
      Yueh-Hsuan Chiang 提交于
      Summary:
      Fixed a bug in EventListener::OnCompactionCompleted() that returns
      incorrect list of input / output file names.
      
      Test Plan: Extend existing test in listener_test.cc
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38349
      14431e97
    • I
      Add more table properties to EventLogger · dbd95b75
      Igor Canadi 提交于
      Summary:
      Example output:
      
          {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}}
      
      Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter.
      
      Test Plan: make check + check out the output in the log
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38343
      dbd95b75
    • I
      Reset parent_index and base_index when picking files marked for compaction · b5881762
      Igor Canadi 提交于
      Summary: This caused a crash of our MongoDB + RocksDB instance. PickCompactionBySize() sets its own parent_index. We never reset this parent_index when picking PickFilesMarkedForCompactionExperimental(). So we might end up doing SetupOtherInputs() with parent_index that was set by PickCompactionBySize, although we're using compaction calculated using PickFilesMarkedForCompactionExperimental.
      
      Test Plan: Added a unit test that fails with assertion on master.
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38337
      b5881762
  14. 12 5月, 2015 1 次提交
    • A
      API to fetch from both a WriteBatchWithIndex and the db · 711465cc
      agiardullo 提交于
      Summary:
      Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch.  (This is needed for transactions).
      
      I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep).
      
      Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc.  WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)).
      
      Test Plan: Added new unit tests.
      
      Reviewers: rven, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38037
      711465cc
  15. 10 5月, 2015 1 次提交
  16. 09 5月, 2015 1 次提交