1. 19 10月, 2015 1 次提交
  2. 18 10月, 2015 1 次提交
  3. 17 10月, 2015 1 次提交
    • S
      Add more kill points · 277dea78
      sdong 提交于
      Summary:
      Add kill points in:
      1. after creating a file
      2. before writing a manifest record
      3. before syncing manifest
      4. before creating a new current file
      5. after creating a new current file
      
      Test Plan: Run all current tests.
      
      Reviewers: yhchiang, igor, anthony, IslamAbdelRahman, rven, kradhakrishnan
      
      Reviewed By: kradhakrishnan
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48855
      277dea78
  4. 14 10月, 2015 1 次提交
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  5. 22 9月, 2015 1 次提交
    • S
      Add a mode to always pick the oldest file to compact for each level · f1b9f804
      sdong 提交于
      Summary:
      Add options.compaction_pri, which specifies the policy about which file to compact first.
      kCompactionPriByLargestSeq will compact oldest files first.
      Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable.
      
      Test Plan: Will write unit tests
      
      Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D45951
      f1b9f804
  6. 16 9月, 2015 1 次提交
    • I
      LogAndApply() should fail if the column family has been dropped · a7e80379
      Igor Canadi 提交于
      Summary:
      This patch finally fixes the ColumnFamilyTest.ReadDroppedColumnFamily test. The test has been failing very sporadically and it was hard to repro. However, I managed to write a new tests that reproes the failure deterministically.
      
      Here's what happens:
      1. We start the flush for the column family
      2. We check if the column family was dropped here: https://github.com/facebook/rocksdb/blob/a3fc49bfddcdb1ff29409aacd06c04df56c7a1d7/db/flush_job.cc#L149
      3. This check goes through, ends up in InstallMemtableFlushResults() and it goes into LogAndApply()
      4. At about this time, we start dropping the column family. Dropping the column family process gets to LogAndApply() at about the same time as LogAndApply() from flush process
      5. Drop column family goes through LogAndApply() first, marking the column family as dropped.
      6. Flush process gets woken up and gets a chance to write to the MANIFEST. However, this is where it gets stuck: https://github.com/facebook/rocksdb/blob/a3fc49bfddcdb1ff29409aacd06c04df56c7a1d7/db/version_set.cc#L1975
      7. We see that the column family was dropped, so there is no need to write to the MANIFEST. We return OK.
      8. Flush gets OK back from LogAndApply() and it deletes the memtable, thinking that the data is now safely persisted to sst file.
      
      The fix is pretty simple. Instead of OK, we return ShutdownInProgress. This is not really true, but we have been using this status code to also mean "this operation was canceled because the column family has been dropped".
      
      The fix is only one LOC. All other code is related to tests. I added a new test that reproes the failure. I also moved SleepingBackgroundTask to util/testutil.h (because I needed it in column_family_test for my new test). There's plenty of other places where we reimplement SleepingBackgroundTask, but I'll address that in a separate commit.
      
      Test Plan:
      1. new test
      2. make check
      3. Make sure the ColumnFamilyTest.ReadDroppedColumnFamily doesn't fail on Travis: https://travis-ci.org/facebook/rocksdb/jobs/79952386
      
      Reviewers: yhchiang, anthony, IslamAbdelRahman, kradhakrishnan, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46773
      a7e80379
  7. 11 9月, 2015 1 次提交
    • A
      Determine boundaries of subcompactions · 3c37b3cc
      Ari Ekmekji 提交于
      Summary:
      Up to this point, the subcompactions that make up a compaction
      job have been divided based on the key range of the L1 files, and each
      subcompaction has handled the key range of only one file. However
      DBOption.max_subcompactions allows the user to designate how many
      subcompactions at most to perform. This patch updates the
      CompactionJob::GetSubcompactionBoundaries() to determine these
      divisions accordingly based on that option and other input/system factors.
      
      The current approach orders the starting and/or ending keys of certain
      compaction input files and then generates a histogram to approximate the
      size covered by the key range between each consecutive pair of keys. Then
      it groups these ranges into groups so that the sizes are approximately equal
      to one another. The approach has also been adapted to work for universal
      compaction as well instead of just for level-based compaction as it was before.
      
      These subcompactions are then executed in parallel by locally spawning
      threads, one for each. The results are then aggregated and the compaction
      completed.
      
      Test Plan: make all && make check
      
      Reviewers: yhchiang, anthony, igor, noetzli, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43269
      3c37b3cc
  8. 09 9月, 2015 1 次提交
    • A
      Added Equal method to Comparator interface · 6bdc484f
      Andres Noetzli 提交于
      Summary:
      In some cases, equality comparisons can be done more efficiently than three-way
      comparisons. There are quite a few places in the code where we only care about
      equality. This patch adds an Equal() method that defaults to using the
      Compare() method.
      
      Test Plan: make clean all check
      
      Reviewers: rven, anthony, yhchiang, igor, sdong
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46233
      6bdc484f
  9. 03 9月, 2015 1 次提交
  10. 26 8月, 2015 1 次提交
    • Y
      Expose per-level aggregated table properties via GetProperty() · 6996de87
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch adds "rocksdb.aggregated-table-properties"
      and "rocksdb.aggregated-table-properties-at-levelN", the former
      returns the aggreated table properties of a column family,
      while the later returns the aggregated table properties
      of the specified level N.
      
      Test Plan: Added tests in db_test
      
      Reviewers: igor, sdong, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45087
      6996de87
  11. 21 8月, 2015 2 次提交
    • S
      Add a counter about estimated pending compaction bytes · 07d2d341
      sdong 提交于
      Summary:
      Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
      In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.
      
      Test Plan: Add unit tests
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44205
      07d2d341
    • I
      Total SST files size DB Property · 027ca5b2
      Islam AbdelRahman 提交于
      Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions
      
      Test Plan: Unittests for the new property
      
      Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44799
      027ca5b2
  12. 15 8月, 2015 1 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
  13. 12 8月, 2015 1 次提交
    • I
      Parallelize LoadTableHandlers · cee1e8a0
      Islam AbdelRahman 提交于
      Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover
      
      Test Plan:
      make check -j64
      COMPILE_WITH_TSAN=1 make check -j64
      DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running)
      
      Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43755
      cee1e8a0
  14. 08 8月, 2015 1 次提交
    • A
      Better CompactionJob testing · 68f93435
      Andres Notzli 提交于
      Summary:
      Changed compaction_job_test to support better/more thorough
      tests and added two tests. Also changed MockFileContents
      to order using InternalKeyComparator.
      
      Test Plan: make compaction_job_test && ./compaction_job_test; make all && make check
      
      Reviewers: sdong, rven, igor, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42837
      68f93435
  15. 05 8月, 2015 1 次提交
    • Y
      Add DBOptions::skip_sats_update_on_db_open · 14d0bfa4
      Yueh-Hsuan Chiang 提交于
      Summary:
      UpdateAccumulatedStats() is used to optimize compaction decision
      esp. when the number of deletion entries are high, but this function
      can slowdown DBOpen esp. in disk environment.
      
      This patch adds DBOptions::skip_sats_update_on_db_open, which skips
      UpdateAccumulatedStats() in DB::Open() time when it's set to true.
      
      Test Plan: Add DBCompactionTest.SkipStatsUpdateTest
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: tnovak, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42843
      14d0bfa4
  16. 04 8月, 2015 1 次提交
    • A
      Parallelize L0-L1 Compaction: Restructure Compaction Job · 40c64434
      Ari Ekmekji 提交于
      Summary:
      As of now compactions involving files from Level 0 and Level 1 are single
      threaded because the files in L0, although sorted, are not range partitioned like
      the other levels. This means that during L0-L1 compaction each file from L1
      needs to be merged with potentially all the files from L0.
      
      This attempt to parallelize the L0-L1 compaction assigns a thread and a
      corresponding iterator to each L1 file that then considers only the key range
      found in that L1 file and only the L0 files that have those keys (and only the
      specific portion of those L0 files in which those keys are found). In this way
      the overlap is minimized and potentially eliminated between different iterators
      focusing on the same files.
      
      The first step is to restructure the compaction logic to break L0-L1 compactions
      into multiple, smaller, sequential compactions. Eventually each of these smaller
      jobs will be run simultaneously. Areas to pay extra attention to are
      
        # Correct aggregation of compaction job statistics across multiple threads
        # Proper opening/closing of output files (make sure each thread's is unique)
        # Keys that span multiple L1 files
        # Skewed distributions of keys within L0 files
      
      Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test
      
      Reviewers: igor, noetzli, anthony, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42699
      40c64434
  17. 22 7月, 2015 1 次提交
    • A
      Report live data size estimate · 06aebca5
      Andres Notzli 提交于
      Summary:
      Fixes T6548822. Added a new function for estimating the size of the live data
      as proposed in the task. The value can be accessed through the property
      rocksdb.estimate-live-data-size.
      
      Test Plan:
      There are two unit tests in version_set_test and a simple test in db_test.
      make version_set_test && ./version_set_test;
      make db_test && ./db_test gtest_filter=GetProperty
      
      Reviewers: rven, igor, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41493
      06aebca5
  18. 18 7月, 2015 3 次提交
    • S
      Move rate_limiter, write buffering, most perf context instrumentation and most... · 6e9fbeb2
      sdong 提交于
      Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
      
      Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
      
      Test Plan: Run all existing unit tests.
      
      Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D42321
      6e9fbeb2
    • I
      Don't let flushes preempt compactions · 35ca5936
      Igor Canadi 提交于
      Summary:
      When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.
      
      We have a special case that when you set max_background_flushes to 0, we
      schedule the flush to execute on the compaction thread.
      
      Test Plan: make check (there might be some unit tests that depend on this behavior)
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41931
      35ca5936
    • A
      Added JSON manifest dump option to ldb command · 74c755c5
      Ari Ekmekji 提交于
      Summary:
      Added a new flag --json to the ldb manifest_dump command
      that prints out the version edits as JSON objects for easier
      reading and parsing of information.
      
      Test Plan:
      **Sample usage: **
      ```
      ./ldb manifest_dump --json --path=path/to/manifest/file
      ```
      
      **Sample output:**
      ```
      {"EditNumber": 0, "Comparator": "leveldb.BytewiseComparator", "ColumnFamily": 0}
      {"EditNumber": 1, "LogNumber": 0, "ColumnFamily": 0}
      {"EditNumber": 2, "LogNumber": 4, "PrevLogNumber": 0, "NextFileNumber": 7, "LastSeq": 35356, "AddedFiles": [{"Level": 0, "FileNumber": 5, "FileSize": 1949284, "SmallestIKey": "'", "LargestIKey": "'"}], "ColumnFamily": 0}
      ...
      {"EditNumber": 13, "PrevLogNumber": 0, "NextFileNumber": 36, "LastSeq": 290994, "DeletedFiles": [{"Level": 0, "FileNumber": 17}, {"Level": 0, "FileNumber": 20}, {"Level": 0, "FileNumber": 22}, {"Level": 0, "FileNumber": 24}, {"Level": 1, "FileNumber": 13}, {"Level": 1, "FileNumber": 14}, {"Level": 1, "FileNumber": 15}, {"Level": 1, "FileNumber": 18}], "AddedFiles": [{"Level": 1, "FileNumber": 25, "FileSize": 2114340, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 26, "FileSize": 2115213, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 27, "FileSize": 2114807, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 30, "FileSize": 2115271, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 31, "FileSize": 2115165, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 32, "FileSize": 2114683, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 35, "FileSize": 1757512, "SmallestIKey": "'", "LargestIKey": "'"}], "ColumnFamily": 0}
      ...
      ```
      
      Reviewers: sdong, anthony, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D41727
      74c755c5
  19. 03 7月, 2015 1 次提交
  20. 01 7月, 2015 1 次提交
  21. 12 6月, 2015 1 次提交
    • S
      Slow down writes by bytes written · 7842920b
      sdong 提交于
      Summary:
      We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch.
      
      The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work
      
      hard_rate_limit is deprecated.
      
      options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up.
      
      Test Plan: Add new unit tests in db_test
      
      Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor
      
      Reviewed By: igor
      
      Subscribers: ikabiljo, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D36351
      7842920b
  22. 10 6月, 2015 1 次提交
    • S
      Print info message about files need compaction for debuging purpose · 75d7075a
      sdong 提交于
      Summary:
      When there are files marked for compaction after compactions, print extra messages to help debugging. Example:
      
      2015/06/08-23:12:55.212855 7ff5013ff700 [default] [JOB 121] Generated table #75: 54 keys, 4807 bytes (need compaction)
      
      2015/06/08-23:12:55.556194 7ff5013ff700 (Original Log Time 2015/06/08-23:12:55.556160) [default] compacted to: base level 1 max bytes base
      10240 files[0 1 9 32 12 0 0 0] max score 0.96 (2 files need compaction), MB/sec: 0.0 rd, 0.1 wr, level 2, files in(1, 3) out(5) MB in(0.0,
      0.0) out(0.0), read-write-amplify(11.3) write-amplify(5.7) OK, records in: 40, records dropped: 0
      
      Test Plan:
      Run test and see LOG files.
      
      valgrind test DBTest.TablePropertiesNeedCompactTest
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: yoshinorim, maykov, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39771
      75d7075a
  23. 06 6月, 2015 1 次提交
  24. 05 6月, 2015 2 次提交
    • I
      Fix compile · b2785472
      Igor Canadi 提交于
      Summary:
      This commit broke the compile: https://github.com/facebook/rocksdb/commit/3ce3bb3da2486c2c18a332128dda7c05a91abb85
      As evidenced here: https://evergreen.mongodb.com/task/mongodb_mongo_master_ubuntu1404_rocksdb_compile_ce2b1d11d42de93f7b375f7e6c41fb709f66e969_15_06_04_23_09_36
      
      This should fix it
      
      Test Plan: make check
      
      Reviewers: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39627
      b2785472
    • I
      Allowing L0 -> L1 trivial move on sorted data · 3ce3bb3d
      Islam AbdelRahman 提交于
      Summary:
      This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping
      
      The conditions for trivial move have been updated
      
      Introduced conditions:
        - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual)
        - Input level files cannot be overlapping
      
      Removed conditions:
        - Trivial move only run when the compaction is not manual
        - Input level should can contain only 1 file
      
      More context on what tests failed because of Trivial move
      ```
      DBTest.CompactionsGenerateMultipleFiles
      This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1
      ```
      
      ```
      DBTest.NoSpaceCompactRange
      This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation
      because trivial move does not need any extra space, and did not check for that
      ```
      
      ```
      DBTest.DropWrites
      Similar to DBTest.NoSpaceCompactRange
      ```
      
      ```
      DBTest.DeleteObsoleteFilesPendingOutputs
      This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3
      ```
      
      ```
      CuckooTableDBTest.CompactionIntoMultipleFiles
      Same as DBTest.CompactionsGenerateMultipleFiles
      ```
      
      This diff is based on a work by @sdong https://reviews.facebook.net/D34149
      
      Test Plan: make -j64 check
      
      Reviewers: rven, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yhchiang, ott, march, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D34797
      3ce3bb3d
  25. 02 6月, 2015 1 次提交
  26. 30 5月, 2015 1 次提交
    • A
      Optimistic Transactions · dc9d70de
      agiardullo 提交于
      Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.
      
      Test Plan: Added a new test, but still need to look into stress testing.
      
      Reviewers: yhchiang, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D33435
      dc9d70de
  27. 27 5月, 2015 1 次提交
    • Y
      Compaction now conditionally boosts the size of deletion entries. · 3ab8ffd4
      Yueh-Hsuan Chiang 提交于
      Summary:
      Compaction now boosts the size of deletion entries of a file only when
      the number of deletion entries is greater than the number of non-deletion
      entries in the file.  The motivation here is that in a stable workload,
      the number of deletion entries should be roughly equal to the number of
      non-deletion entries.  If we compensate the size of deletion entries in a
      stable workload, the deletion compensation logic might introduce unwanted
      effet which changes the shape of LSM tree.
      
      Test Plan: db_test --gtest_filter="*Deletion*"
      
      Reviewers: sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38703
      3ab8ffd4
  28. 22 5月, 2015 1 次提交
    • I
      Don't artificially inflate L0 score · 7a357751
      Igor Canadi 提交于
      Summary:
      This turns out to be pretty bad because if we prioritize L0->L1 then L1 can grow artificially large, which makes L0->L1 more and more expensive. For example:
      256MB @ L0 + 256MB @ L1 --> 512MB @ L1
      256MB @ L0 + 512MB @ L1 --> 768MB @ L1
      256MB @ L0 + 768MB @ L1 --> 1GB @ L1
      
      ....
      
      256MB @ L0 + 10GB @ L1 --> 10.2GB @ L1
      
      At some point we need to start compacting L1->L2 to speed up L0->L1.
      
      Test Plan:
      The performance improvement is massive for heavy write workload. This is the benchmark I ran: https://phabricator.fb.com/P19842671. Before this change, the benchmark took 47 minutes to complete. After, the benchmark finished in 2minutes. You can see full results here: https://phabricator.fb.com/P19842674
      
      Also, we ran this diff on MongoDB on RocksDB on one replicaset. Before the change, our initial sync was so slow that it couldn't keep up with primary writes. After the change, the import finished without any issues
      
      Reviewers: dynamike, MarkCallaghan, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38637
      7a357751
  29. 01 5月, 2015 2 次提交
    • I
      Fix clang build · dddceefe
      Igor Canadi 提交于
      Summary: fix build
      
      Test Plan: works
      
      Reviewers: kradhakrishnan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37911
      dddceefe
    • K
      Optimize GetApproximateSizes() to use lesser CPU cycles. · d4540654
      krad 提交于
      Summary:
      CPU profiling reveals GetApproximateSizes as a bottleneck for performance. The current implementation is sub-optimal, it scans every file in every level to compute the result.
      
      We can take advantage of the fact that all levels above 0 are sorted in the increasing order of key ranges and use binary search to locate the starting index. This can reduce the number of comparisons required to compute the result.
      
      Test Plan: We have good test coverage. Run the tests.
      
      Reviewers: sdong, igor, rven, dynamike
      
      Subscribers: dynamike, maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37755
      d4540654
  30. 30 4月, 2015 1 次提交
  31. 25 4月, 2015 1 次提交
  32. 24 4月, 2015 2 次提交
  33. 18 4月, 2015 1 次提交
    • I
      Add experimental API MarkForCompaction() · 6059bdf8
      Igor Canadi 提交于
      Summary:
      Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).
      
      All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.
      
      MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.
      
      Test Plan: added a new unit test
      
      Reviewers: yhchiang, rven, MarkCallaghan, sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37083
      6059bdf8
  34. 04 4月, 2015 1 次提交
    • J
      avoid returning a number-of-active-keys estimate of nearly 2^64 · d2a92c13
      Jim Meyering 提交于
      Summary:
      If accumulated_num_non_deletions_ were ever smaller than
      accumulated_num_deletions_, the computation of
      "accumulated_num_non_deletions_ - accumulated_num_deletions_"
      would result in a logically "negative" value, but since
      the two operands are unsigned (uint64_t), the result corresponding
      to e.g., -1 would 2^64-1.
      
      Instead, return 0 in that case.
      
      Test Plan:
        - ensure "make check" still passes
        - temporarily add an "abort();" call in the new "if"-block, and
            observe that it fails in some test cases.  However, note that
            this case is triggered only when the two numbers are equal.
            Thus, no test case triggers the erroneous behavior this
            change is designed to avoid. If anyone can construct a
            scenario in which that bug would be triggered, I'll be
            happy to add a test case.
      
      Reviewers: ljin, igor, rven, igor.sugak, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D36489
      d2a92c13