1. 28 1月, 2014 1 次提交
  2. 25 1月, 2014 1 次提交
    • I
      Make VersionSet::ReduceNumberOfLevels() static · 677fee27
      Igor Canadi 提交于
      Summary:
      A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method.
      
      Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file.
      
      Test Plan: reduce_levels_test
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15303
      677fee27
  3. 24 1月, 2014 1 次提交
    • I
      ColumnFamilySet · 7c5e583a
      Igor Canadi 提交于
      Summary:
      I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner.
      
      Let me know if you have any comments. I will commit tomorrow.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15357
      7c5e583a
  4. 17 1月, 2014 2 次提交
    • I
      Remove compaction pointers · 6d6fb709
      Igor Canadi 提交于
      Summary: The only thing we do with compaction pointers is set them to some values, we never actually read them. I don't know what we used them for, but it doesn't look like we use them anymore.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15225
      6d6fb709
    • I
      CompactionPicker · c699c84a
      Igor Canadi 提交于
      Summary:
      This is a big one. This diff moves all the code related to picking compactions from VersionSet to new class CompactionPicker. Column families' compactions will be completely separate processes, so we need to have multiple CompactionPickers.
      
      To make this easier to review, most of the code change is just copy/paste. There is also a small change not to use VersionSet::current_, but rather to take `Version* version` as a parameter. Most of the other code is exactly the same.
      
      In future diffs, I will also make some improvements to CompactionPickers. I think the most important part will be encapsulating it better. Currently Version, VersionSet, Compaction and CompactionPicker are all friend classes, which makes it harder to change the implementation.
      
      This diff depends on D15171, D15183, D15189 and D15201
      
      Test Plan: `make check`
      
      Reviewers: kailiu, sdong, dhruba, haobo
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15207
      c699c84a
  5. 16 1月, 2014 4 次提交
    • I
      Move more functions from VersionSet to Version · 787f11bb
      Igor Canadi 提交于
      Summary:
      This moves functions:
      * VersionSet::Finalize() -> Version::UpdateCompactionStats()
      * VersionSet::UpdateFilesBySize() -> Version::UpdateFilesBySize()
      
      The diff depends on D15189, D15183 and D15171
      
      Test Plan: make check
      
      Reviewers: kailiu, sdong, haobo, dhruba
      
      Reviewed By: sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15201
      787f11bb
    • I
      Moving Compaction class to separate header file · 615d1ea2
      Igor Canadi 提交于
      Summary:
      I'm sure we'll all agree that version_set.cc needs simplifying. This diff moves Compaction class to a separate file.
      
      The diff depends on D15171 and D15183
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15189
      615d1ea2
    • I
      Move functions from VersionSet to Version · 2f4eda78
      Igor Canadi 提交于
      Summary:
      There were some functions in VersionSet that had no reason to be there instead of Version. Moving them to Version will make column families implementation easier.
      
      The functions moved are:
      * NumLevelBytes
      * LevelSummary
      * LevelFileSummary
      * MaxNextLevelOverlappingBytes
      * AddLiveFiles (previously AddLiveFilesCurrentVersion())
      * NeedSlowdownForNumLevel0Files
      
      The diff continues on (and depends on) D15171
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong, emayanke
      
      Reviewed By: sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15183
      2f4eda78
    • I
      Decrease reliance on VersionSet::NumberLevels() · 65a8a52b
      Igor Canadi 提交于
      Summary:
      With column families VersionSet will not have a constant number of levels (each CF can have different options), so we'll need to eliminate call to VersionSet::NumberLevels()
      
      This diff decreases number of callsites, but we're not there yet. It associates number of levels with Version (each version is associated with single CF) instead of VersionSet.
      
      I have also slightly changed how VersionSet keeps track of manifest size.
      
      This diff also modifies constructor of Compaction such that it takes input_version and automatically Ref()s it. Before this was done outside of constructor.
      
      In next diffs I will continue to decrease number of callsites of VersionSet::NumberLevels() and also references to current_
      
      Test Plan: make check
      
      Reviewers: haobo, dhruba, kailiu, sdong
      
      Reviewed By: sdong
      
      Differential Revision: https://reviews.facebook.net/D15171
      65a8a52b
  6. 15 1月, 2014 2 次提交
    • I
      Fix CompactRange to apply filter to every key · d9cd7a06
      Igor Canadi 提交于
      Summary:
      When doing CompactRange(), we should first flush the memtable and then calculate max_level_with_files. Also, we want to compact all the levels that have files, including level `max_level_with_files`.
      
      This patch fixed the unit test.
      
      Test Plan: Added a failing unit test and a fix, so it's not failing anymore.
      
      Reviewers: dhruba, haobo, sdong
      
      Reviewed By: haobo
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D14421
      d9cd7a06
    • S
      Pre-calculate whether to slow down for too many level 0 files · fbbf0d14
      Siying Dong 提交于
      Summary: Currently in DBImpl::MakeRoomForWrite(), we do  "versions_->NumLevelFiles(0) >= options_.level0_slowdown_writes_trigger" to check whether the writer thread needs to slow down. However, versions_->NumLevelFiles(0) is slightly more expensive than we expected. By caching the result of the comparison when installing a new version, we can avoid this function call every time.
      
      Test Plan:
      make all check
      Manually trigger this behavior by applying universal compaction style and make sure inserts are made slow after there are certain number of files.
      
      Reviewers: haobo, kailiu, igor
      
      Reviewed By: kailiu
      
      CC: nkg-, leveldb
      
      Differential Revision: https://reviews.facebook.net/D15141
      fbbf0d14
  7. 14 1月, 2014 2 次提交
  8. 08 1月, 2014 1 次提交
    • I
      [column families] Implement DB::OpenWithColumnFamilies() · 72918eff
      Igor Canadi 提交于
      Summary:
      In addition to implementing OpenWithColumnFamilies, this diff also includes some minor changes:
      * Changed all column family names from Slice() to std::string. The performance of column family name handling is not critical, and it's more convenient and cleaner to have names as std::strings
      * Implemented ColumnFamilyOptions(const Options&) and DBOptions(const Options&)
      * Added ColumnFamilyOptions to VersionSet::ColumnFamilyData. ColumnFamilyOptions are specified on OpenWithColumnFamilies() and CreateColumnFamily()
      
      I will keep the diff in the Phabricator for a day or two and will push to the branch then. Feel free to comment even after the diff has been pushed.
      
      Test Plan: Added a simple unit test
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15033
      72918eff
  9. 03 1月, 2014 1 次提交
  10. 02 1月, 2014 1 次提交
    • I
      [RocksDB] Support for column families in manifest · 75354430
      Igor Canadi 提交于
      Summary:
      <This diff is for Column Family branch>
      
      Added fields in manifest file to support adding and deleting column families.
      
      Pretty simple change, each version edit record can be:
      1. add column family
      2. drop column family
      3. add and delete N files from a single column family (compactions and flushes will generate such records)
      
      Test Plan: make check works, the code is backward compatible
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14733
      75354430
  11. 21 12月, 2013 1 次提交
    • I
      [RocksDB] Optimize locking for Get · 1fdb3f7d
      Igor Canadi 提交于
      Summary:
      Instead of locking and saving a DB state, we can cache a DB state and update it only when it changes. This change reduces lock contention and speeds up read operations on the DB.
      
      Performance improvements are substantial, although there is some cost in no-read workloads. I ran the regression tests on my devserver and here are the numbers:
      
        overwrite                    56345  ->   63001
        fillseq                      193730 ->  185296
        readrandom                   771301 -> 1219803 (58% improvement!)
        readrandom_smallblockcache   677609 ->  862850
        readrandom_memtable_sst      710440 -> 1109223
        readrandom_fillunique_random 221589 ->  247869
        memtablefillrandom           105286 ->   92643
        memtablereadrandom           763033 -> 1288862
      
      Test Plan:
      make asan_check
      I am also running db_stress
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14679
      1fdb3f7d
  12. 12 12月, 2013 1 次提交
  13. 13 11月, 2013 1 次提交
    • I
      Small changes in Deleting obsolete files · 9bc4a26f
      Igor Canadi 提交于
      Summary:
      @haobo's suggestions from https://reviews.facebook.net/D13827
      
      Renaming some variables, deprecating purge_log_after_flush, changing for loop into auto for loop.
      
      I have not implemented deleting objects outside of mutex yet because it would require a big code change - we would delete object in db_impl, which currently does not know anything about object because it's defined in version_edit.h (FileMetaData). We should do it at some point, though.
      
      Test Plan: Ran deletefile_test
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D14025
      9bc4a26f
  14. 09 11月, 2013 1 次提交
    • I
      Speed up FindObsoleteFiles · 1510339e
      Igor Canadi 提交于
      Summary:
      Here's one solution we discussed on speeding up FindObsoleteFiles. Keep a set of all files in DBImpl and update the set every time we create a file. I probably missed few other spots where we create a file.
      
      It might speed things up a bit, but makes code uglier. I don't really like it.
      
      Much better approach would be to abstract all file handling to a separate class. Think of it as layer between DBImpl and Env. Having a separate class deal with file namings and deletion would benefit both code cleanliness (especially with huge DBImpl) and speed things up. It will take a huge effort to do this, though.
      
      Let's discuss offline today.
      
      Test Plan: Ran ./db_stress, verified that files are getting deleted
      
      Reviewers: dhruba, haobo, kailiu, emayanke
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D13827
      1510339e
  15. 01 11月, 2013 1 次提交
    • H
      [RocksDB] Add OnCompactionStart to CompactionFilter class · 8cbe5bb5
      Haobo Xu 提交于
      Summary: This is to give application compaction filter a chance to access context information of a specific compaction run. For example, depending on whether a compaction goes through all data files, the application could do things differently.
      
      Test Plan: make check
      
      Reviewers: dhruba, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13683
      8cbe5bb5
  16. 18 10月, 2013 1 次提交
    • S
      Universal Compaction to Have a Size Percentage Threshold To Decide Whether to Compress · 9edda370
      Siying Dong 提交于
      Summary:
      This patch adds a option for universal compaction to allow us to only compress output files if the files compacted previously did not yet reach a specified ratio, to save CPU costs in some cases.
      
      Compression is always skipped for flushing. This is because the size information is not easy to evaluate for flushing case. We can improve it later.
      
      Test Plan:
      add test
      DBTest.UniversalCompactionCompressRatio1 and DBTest.UniversalCompactionCompressRatio12
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13467
      9edda370
  17. 17 10月, 2013 2 次提交
    • D
      Add appropriate LICENSE and Copyright message. · 9cd22109
      Dhruba Borthakur 提交于
      Summary:
      Add appropriate LICENSE and Copyright message.
      
      Test Plan:
      make check
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      9cd22109
    • S
      Enable background flush thread by default and fix issues related to it · 073cbfc8
      Siying Dong 提交于
      Summary:
      Enable background flush thread in this patch and fix unit tests with:
      (1) After background flush, schedule a background compaction if condition satisfied;
      (2) Fix a bug that if universal compaction is enabled and number of levels are set to be 0, compaction will not be automatically triggered
      (3) Fix unit tests to wait for compaction to finish instead of flush, before checking the compaction results.
      
      Test Plan: pass all unit tests
      
      Reviewers: haobo, xjin, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13461
      073cbfc8
  18. 15 10月, 2013 1 次提交
    • K
      Add statistics to sst file · 86ef6c3f
      Kai Liu 提交于
      Summary:
      So far we only have key/value pairs as well as bloom filter stored in the
      sst file.  It will be great if we are able to store more metadata about
      this table itself, for example, the entry size, bloom filter name, etc.
      
      This diff is the first step of this effort. It allows table to keep the
      basic statistics mentioned in http://fburl.com/14995441, as well as
      allowing writing user-collected stats to stats block.
      
      After this diff, we will figure out the interface of how to allow user to collect their interested statistics.
      
      Test Plan:
      1. Added several unit tests.
      2. Ran `make check` to ensure it doesn't break other tests.
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13419
      86ef6c3f
  19. 06 10月, 2013 1 次提交
  20. 05 10月, 2013 1 次提交
  21. 14 9月, 2013 1 次提交
    • D
      Added a parameter to limit the maximum space amplification for universal compaction. · 4012ca1c
      Dhruba Borthakur 提交于
      Summary:
      Added a new field called max_size_amplification_ratio in the
      CompactionOptionsUniversal structure. This determines the maximum
      percentage overhead of space amplification.
      
      The size amplification is defined to be the ratio between the size of
      the oldest file to the sum of the sizes of all other files. If the
      size amplification exceeds the specified value, then min_merge_width
      and max_merge_width are ignored and a full compaction of all files is done.
      A value of 10 means that the size a database that stores 100 bytes
      of user data could occupy 110 bytes of physical storage.
      
      Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.
      
      Reviewers: haobo, emayanke, xjin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12825
      4012ca1c
  22. 29 8月, 2013 1 次提交
    • D
      Introduced a new flag non_blocking_io in ReadOptions. · fc0c399d
      Dhruba Borthakur 提交于
      Summary:
      If ReadOptions.non_blocking_io is set to true, then KeyMayExists
      and Iterators will return data that is cached in RAM.
      If the Iterator needs to do IO from storage to serve the data,
      then the Iterator.status() will return Status::IsRetry().
      
      Test Plan:
      Enhanced unit test DBTest.KeyMayExist to detect if there were are IOs
      issues from storage. Added DBTest.NonBlockingIteration to verify
      nonblocking Iterations.
      
      Reviewers: emayanke, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Maniphest Tasks: T63
      
      Differential Revision: https://reviews.facebook.net/D12531
      fc0c399d
  23. 24 8月, 2013 1 次提交
  24. 23 8月, 2013 3 次提交
    • T
      Revert "Prefix scan: db_bench and bug fixes" · 94cf2187
      Tyler Harter 提交于
      This reverts commit c2bd8f48.
      94cf2187
    • T
      Prefix scan: db_bench and bug fixes · c2bd8f48
      Tyler Harter 提交于
      Summary: If use_prefix_filters is set and read_range>1, then the random seeks will set a the prefix filter to be the prefix of the key which was randomly selected as the target.  Still need to add statistics (perhaps in a separate diff).
      
      Test Plan: ./db_bench --benchmarks=fillseq,prefixscanrandom --num=10000000 --statistics=1 --use_prefix_blooms=1 --use_prefix_api=1 --bloom_bits=10
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D12273
      c2bd8f48
    • S
      Add APIs to query SST file metadata and to delete specific SST files · 60bf2b7d
      Simha Venkataramaiah 提交于
      Summary: An api to query the level, key ranges, size etc for each SST file and an api to delete a specific file from the db and all associated state in the bookkeeping datastructures.
      
      Notes: Editing the manifest version does not release the obsolete files right away. However deleting the file directly will mess up the iterator. We may need a more aggressive/timely file deletion api.
      
      I have used std::unique_ptr - will switch to boost:: since this is external. thoughts?
      
      Unit test is fragile right now as it expects the compaction at certain levels.
      
      Test Plan: unittest
      
      Reviewers: dhruba, vamsi, emayanke
      
      CC: zshao, leveldb, haobo
      
      Task ID: #
      
      Blame Rev:
      60bf2b7d
  25. 10 8月, 2013 1 次提交
  26. 06 8月, 2013 1 次提交
    • D
      [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. · c2d7826c
      Deon Nicholas 提交于
      Summary:
      Here are the major changes to the Merge Interface. It has been expanded
      to handle cases where the MergeOperator is not associative. It does so by stacking
      up merge operations while scanning through the key history (i.e.: during Get() or
      Compaction), until a valid Put/Delete/end-of-history is encountered; it then
      applies all of the merge operations in the correct sequence starting with the
      base/sentinel value.
      
      I have also introduced an "AssociativeMerge" function which allows the user to
      take advantage of associative merge operations (such as in the case of counters).
      The implementation will always attempt to merge the operations/operands themselves
      together when they are encountered, and will resort to the "stacking" method if
      and only if the "associative-merge" fails.
      
      This implementation is conjectured to allow MergeOperator to handle the general
      case, while still providing the user with the ability to take advantage of certain
      efficiencies in their own merge-operator / data-structure.
      
      NOTE: This is a preliminary diff. This must still go through a lot of review,
      revision, and testing. Feedback welcome!
      
      Test Plan:
        -This is a preliminary diff. I have only just begun testing/debugging it.
        -I will be testing this with the existing MergeOperator use-cases and unit-tests
      (counters, string-append, and redis-lists)
        -I will be "desk-checking" and walking through the code with the help gdb.
        -I will find a way of stress-testing the new interface / implementation using
      db_bench, db_test, merge_test, and/or db_stress.
        -I will ensure that my tests cover all cases: Get-Memtable,
      Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0,
      Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found,
      end-of-history, end-of-file, etc.
        -A lot of feedback from the reviewers.
      
      Reviewers: haobo, dhruba, zshao, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11499
      c2d7826c
  27. 02 8月, 2013 1 次提交
    • M
      Expand KeyMayExist to return the proper value if it can be found in memory and... · 59d0b02f
      Mayank Agarwal 提交于
      Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache
      
      Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
      
      Test Plan: make all check;db_stress for 1 hour
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11853
      59d0b02f
  28. 24 7月, 2013 1 次提交
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  29. 20 7月, 2013 1 次提交
  30. 18 7月, 2013 1 次提交
  31. 12 7月, 2013 1 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919