1. 14 10月, 2015 1 次提交
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  2. 10 10月, 2015 1 次提交
    • S
      Pass column family ID to table property collector · 776bd8d5
      sdong 提交于
      Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.
      
      Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.
      
      Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48411
      776bd8d5
  3. 08 10月, 2015 1 次提交
    • I
      Compaction filter on merge operands · d80ce7f9
      Igor Canadi 提交于
      Summary:
      Since Andres' internship is over, I took over https://reviews.facebook.net/D42555 and rebased and simplified it a bit.
      
      The behavior in this diff is a bit simpler than in D42555:
      * only merge operators are passed through FilterMergeValue(). If fitler function returns true, the merge operator is ignored
      * compaction filter is *not* called on: 1) results of merge operations and 2) base values that are getting merged with merge operands (the second case was also true in previous diff)
      
      Do we also need a compaction filter to get called on merge results?
      
      Test Plan: make && make check
      
      Reviewers: lovro, tnovak, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: noetzli, kolmike, leveldb, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D47847
      d80ce7f9
  4. 26 9月, 2015 1 次提交
  5. 18 9月, 2015 1 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
  6. 11 9月, 2015 1 次提交
    • A
      Refactored common code of Builder/CompactionJob out into a CompactionIterator · 8aa1f151
      Andres Noetzli 提交于
      Summary:
      Builder and CompactionJob share a lot of fairly complex code. This patch
      refactors this code into a separate class, the CompactionIterator. Because the
      shared code is fairly complex, this patch hopefully improves maintainability.
      While there are is a lot of potential for further improvements, the patch is
      intentionally pretty close to the original structure because the change is
      already complex enough.
      
      Test Plan: make clean all check && ./db_stress
      
      Reviewers: rven, anthony, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46197
      8aa1f151
  7. 09 9月, 2015 1 次提交
    • A
      Added Equal method to Comparator interface · 6bdc484f
      Andres Noetzli 提交于
      Summary:
      In some cases, equality comparisons can be done more efficiently than three-way
      comparisons. There are quite a few places in the code where we only care about
      equality. This patch adds an Equal() method that defaults to using the
      Compare() method.
      
      Test Plan: make clean all check
      
      Reviewers: rven, anthony, yhchiang, igor, sdong
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46233
      6bdc484f
  8. 03 9月, 2015 1 次提交
    • A
      Unified maps with Comparator for sorting, other cleanup · 3c9cef1e
      Andres Noetzli 提交于
      Summary:
      This diff is a collection of cleanups that were initially part of D43179.
      Additionally it adds a unified way of defining key-value maps that use a
      Comparator for sorting (this was previously implemented in four different
      places).
      
      Test Plan: make clean check all
      
      Reviewers: rven, anthony, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45993
      3c9cef1e
  9. 25 8月, 2015 1 次提交
    • I
      Smarter purging during flush · 4ab26c5a
      Igor Canadi 提交于
      Summary:
      Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks.
      
      This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point.
      
      I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature.
      
      Test Plan:
      make check
      I had to adjust some unit tests to understand this new behavior
      
      Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli
      
      Reviewed By: noetzli
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42087
      4ab26c5a
  10. 18 8月, 2015 1 次提交
    • A
      Simplify querying of merge results · f32a5720
      Andres Notzli 提交于
      Summary:
      While working on supporting mixing merge operators with
      single deletes ( https://reviews.facebook.net/D43179 ),
      I realized that returning and dealing with merge results
      can be made simpler. Submitting this as a separate diff
      because it is not directly related to single deletes.
      
      Before, callers of merge helper had to retrieve the merge
      result in one of two ways depending on whether the merge
      was successful or not (success = result of merge was single
      kTypeValue). For successful merges, the caller could query
      the resulting key/value pair and for unsuccessful merges,
      the result could be retrieved in the form of two deques of
      keys and values. However, with single deletes, a successful merge
      does not return a single key/value pair (if merge
      operands are merged with a single delete, we have to generate
      a value and keep the original single delete around to make
      sure that we are not accidentially producing a key overwrite).
      In addition, the two existing call sites of the merge
      helper were taking the same actions independently from whether
      the merge was successful or not, so this patch simplifies that.
      
      Test Plan: make clean all check
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43353
      f32a5720
  11. 15 8月, 2015 1 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
  12. 18 7月, 2015 1 次提交
    • S
      Move rate_limiter, write buffering, most perf context instrumentation and most... · 6e9fbeb2
      sdong 提交于
      Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
      
      Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
      
      Test Plan: Run all existing unit tests.
      
      Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D42321
      6e9fbeb2
  13. 14 7月, 2015 1 次提交
    • I
      Deprecate purge_redundant_kvs_while_flush · a9c51095
      Igor Canadi 提交于
      Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.
      
      Test Plan: none
      
      Reviewers: sdong, yhchiang, anthony, dhruba
      
      Reviewed By: dhruba
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42063
      a9c51095
  14. 06 6月, 2015 1 次提交
  15. 16 5月, 2015 1 次提交
    • Y
      Allow GetThreadList to report Flush properties. · 3f0867c0
      Yueh-Hsuan Chiang 提交于
      Summary:
      Allow GetThreadList to report Flush properties, which includes:
      * job id
      * number of bytes that has been written since flush started.
      * total size of input mem-tables
      
      Test Plan:
      ./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000
      
      Sample output from db_bench which tracks same flush job
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140213879898240   High Pri      default                Flush       5789 us                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 577104 | JobID 8 |
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140213879898240   High Pri      default                Flush     30.634 ms                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 1734865 | JobID 8 |
      
      Reviewers: rven, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38505
      3f0867c0
  16. 13 5月, 2015 1 次提交
    • I
      Add more table properties to EventLogger · dbd95b75
      Igor Canadi 提交于
      Summary:
      Example output:
      
          {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}}
      
      Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter.
      
      Test Plan: make check + check out the output in the log
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38343
      dbd95b75
  17. 24 4月, 2015 1 次提交
  18. 07 4月, 2015 1 次提交
    • S
      A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge · 953a885e
      sdong 提交于
      Summary:
      Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.
      
      Also refactor the codes so that
      (1) make table property collector and internal table property collector two separate data structures with the later one now exposed
      (2) table builders only receive internal table properties
      
      Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.
      
      Reviewers: yhchiang, igor.sugak, rven, igor
      
      Reviewed By: rven, igor
      
      Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D35373
      953a885e
  19. 27 2月, 2015 1 次提交
    • S
      Add columnfamily option optimize_filters_for_hits to optimize for key hits only · e7c434c3
      Sameet Agarwal 提交于
      Summary:
          Summary:
          Added a new option to ColumnFamllyOptions  - optimize_filters_for_hits. This option can be used in the case where most
          accesses to the store are key hits and we dont need to optimize performance for key misses.
          This is useful when you have a very large database and most of your lookups succeed.  The option allows the store to
           not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
           space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
           for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.
      
          This option is only provided for BlockBasedTable. We skip the filters when we are compacting
      
      Test Plan:
      1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
       2. Added another unit test to db_test which specifically tests that filters are being skipped
      
      Reviewers: rven, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D33717
      e7c434c3
  20. 01 11月, 2014 1 次提交
    • I
      Turn on -Wshadow · 9f7fc3ac
      Igor Canadi 提交于
      Summary:
      ...and fix all the errors :)
      
      Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
      
      Test Plan: compiles
      
      Reviewers: yhchiang, rven, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27711
      9f7fc3ac
  21. 05 9月, 2014 1 次提交
    • L
      introduce ImmutableOptions · 5665e5e2
      Lei Jin 提交于
      Summary:
      As a preparation to support updating some options dynamically, I'd like
      to first introduce ImmutableOptions, which is a subset of Options that
      cannot be changed during the course of a DB lifetime without restart.
      
      ColumnFamily will keep both Options and ImmutableOptions. Any component
      below ColumnFamily should only take ImmutableOptions in their
      constructor. Other options should be taken from APIs, which will be
      allowed to adjust dynamically.
      
      I am yet to make changes to memtable and other related classes to take
      ImmutableOptions in their ctor. That can be done in a seprate diff as
      this one is already pretty big.
      
      Test Plan: make all check
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D22545
      5665e5e2
  22. 01 8月, 2014 1 次提交
    • Y
      Remove a check for merge operator in builder.cc · 67dae255
      Yueh-Hsuan Chiang 提交于
      Summary:
      Previously, builder.cc has a check for merge operator which prevents
      RocksDB from crash when reopening a DB w/o properly specifying the merge
      operator.  However, currently we observed a memory leak on failing in
      RocksDB recovery.  This diff removes such check and let it crash instead of
      causing memory leak for now before we have identified the real cause of
      the memory leak.
      
      Test Plan: make all check
      
      Reviewers: sdong
      
      Subscribers: ljin, igor
      
      Differential Revision: https://reviews.facebook.net/D20913
      67dae255
  23. 31 7月, 2014 1 次提交
  24. 09 7月, 2014 1 次提交
    • L
      integrate rate limiter into rocksdb · 534357ca
      Lei Jin 提交于
      Summary:
      Add option and plugin rate limiter for PosixWritableFile. The rate
      limiter only applies to flush and compaction. WAL and MANIFEST are
      excluded from this enforcement.
      
      Test Plan: db_test
      
      Reviewers: igor, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19425
      534357ca
  25. 03 7月, 2014 1 次提交
  26. 17 6月, 2014 1 次提交
    • S
      Refactor: group metadata needed to open an SST file to a separate copyable struct · cadc1adf
      sdong 提交于
      Summary:
      We added multiple fields to FileMetaData recently and are planning to add more.
      This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements:
      
      (1) use it to design a more efficient data structure to speed up read queries.
      (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data.
      
      The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D18933
      cadc1adf
  27. 25 3月, 2014 1 次提交
    • Y
      Enhance partial merge to support multiple arguments · cda4006e
      Yueh-Hsuan Chiang 提交于
      Summary:
      * PartialMerge api now takes a list of operands instead of two operands.
      * Add min_pertial_merge_operands to Options, indicating the minimum
        number of operands to trigger partial merge.
      * This diff is based on Schalk's previous diff (D14601), but it also
        includes necessary changes such as updating the pure C api for
        partial merge.
      
      Test Plan:
      * make check all
      * develop tests for cases where partial merge takes more than two
        operands.
      
      TODOs (from Schalk):
      * Add test with min_partial_merge_operands > 2.
      * Perform benchmarks to measure the performance improvements (can probably
        use results of task #2837810.)
      * Add description of problem to doc/index.html.
      * Change wiki pages to reflect the interface changes.
      
      Reviewers: haobo, igor, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D16815
      cda4006e
  28. 04 2月, 2014 2 次提交
  29. 03 2月, 2014 1 次提交
  30. 11 1月, 2014 1 次提交
    • S
      [Performance Branch] If options.max_open_files set to be -1, cache table... · aa0ef660
      Siying Dong 提交于
      [Performance Branch] If options.max_open_files set to be -1, cache table readers in FileMetadata for Get() and NewIterator()
      
      Summary:
      In some use cases, table readers for all live files should always be cached. In that case, there will be an opportunity to avoid the table cache look-up while Get() and NewIterator().
      
      We define options.max_open_files = -1 to be the mode that table readers for live files will always be kept. In that mode, table readers are cached in FileMetaData (with a reference count hold in table cache). So that when executing table_cache.Get() and table_cache.newInterator(), LRU cache checking can be by-passed, to reduce latency.
      
      Test Plan: add a test case in db_test
      
      Reviewers: haobo, kailiu
      
      Reviewed By: haobo
      
      CC: dhruba, igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D15039
      aa0ef660
  31. 08 1月, 2014 1 次提交
    • M
      Don't always compress L0 files written by memtable flush · 50994bf6
      Mark Callaghan 提交于
      Summary:
      Code was always compressing L0 files written by a memtable flush
      when compression was enabled. Now this is done when
      min_level_to_compress=0 for leveled compaction and when
      universal_compaction_size_percent=-1 for universal compaction.
      
      Task ID: #3416472
      
      Blame Rev:
      
      Test Plan:
      ran db_bench with compression options
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba, igor, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14757
      50994bf6
  32. 26 11月, 2013 1 次提交
  33. 31 10月, 2013 1 次提交
    • S
      Follow-up Cleaning-up After D13521 · f03b2df0
      Siying Dong 提交于
      Summary:
      This patch is to address @haobo's comments on D13521:
      1. rename Table to be TableReader and make its factory function to be GetTableReader
      2. move the compression type selection logic out of TableBuilder but to compaction logic
      3. more accurate comments
      4. Move stat name constants into BlockBasedTable implementation.
      5. remove some uncleaned codes in simple_table_db_test
      
      Test Plan: pass test suites.
      
      Reviewers: haobo, dhruba, kailiu
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13785
      f03b2df0
  34. 29 10月, 2013 1 次提交
    • S
      Make "Table" pluggable · d4eec30e
      Siying Dong 提交于
      Summary: This patch makes Table and TableBuilder a abstract class and make all the implementation of the current table into BlockedBasedTable and BlockedBasedTable Builder.
      
      Test Plan: Make db_test.cc to work with block based table. Add a new test simple_table_db_test.cc where a different simple table format is implemented.
      
      Reviewers: dhruba, haobo, kailiu, emayanke, vamsi
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13521
      d4eec30e
  35. 18 10月, 2013 1 次提交
    • S
      Universal Compaction to Have a Size Percentage Threshold To Decide Whether to Compress · 9edda370
      Siying Dong 提交于
      Summary:
      This patch adds a option for universal compaction to allow us to only compress output files if the files compacted previously did not yet reach a specified ratio, to save CPU costs in some cases.
      
      Compression is always skipped for flushing. This is because the size information is not easy to evaluate for flushing case. We can improve it later.
      
      Test Plan:
      add test
      DBTest.UniversalCompactionCompressRatio1 and DBTest.UniversalCompactionCompressRatio12
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13467
      9edda370
  36. 17 10月, 2013 1 次提交
  37. 05 10月, 2013 1 次提交
  38. 24 8月, 2013 1 次提交
  39. 21 8月, 2013 1 次提交