1. 22 3月, 2016 1 次提交
  2. 18 3月, 2016 1 次提交
    • M
      Adding pin_l0_filter_and_index_blocks_in_cache feature. · 522de4f5
      Marton Trencseni 提交于
      Summary:
      When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
      What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
      When the table reader is destroyed, it releases the pinned blocks (if there were any). This has to happen before the cache is destroyed, so I had to introduce a TableReader::Close(), to guarantee the order of destruction.
      
      Test Plan:
      Added two unit tests for this. Existing unit tests run fine (default is pin_l0_filter_and_index_blocks_in_cache=false).
      
      DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32
        Mac: OK.
        Linux: with D55287 patched in it's OK.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54801
      522de4f5
  3. 10 2月, 2016 1 次提交
  4. 12 12月, 2015 1 次提交
    • A
      Use SST files for Transaction conflict detection · 3bfd3d39
      agiardullo 提交于
      Summary:
      Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.
      
      With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).
      
      Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.
      
      Test Plan: unit tests, db bench
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D50475
      3bfd3d39
  5. 09 12月, 2015 2 次提交
  6. 08 12月, 2015 1 次提交
    • A
      Support marking snapshots for write-conflict checking · ec704aaf
      agiardullo 提交于
      Summary:
      D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.
      
      This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.
      
      This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).
      
      Test Plan: no behavior change, ran existing tests
      
      Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51183
      ec704aaf
  7. 31 10月, 2015 1 次提交
  8. 30 10月, 2015 1 次提交
  9. 17 10月, 2015 1 次提交
    • S
      Add more kill points · 277dea78
      sdong 提交于
      Summary:
      Add kill points in:
      1. after creating a file
      2. before writing a manifest record
      3. before syncing manifest
      4. before creating a new current file
      5. after creating a new current file
      
      Test Plan: Run all current tests.
      
      Reviewers: yhchiang, igor, anthony, IslamAbdelRahman, rven, kradhakrishnan
      
      Reviewed By: kradhakrishnan
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48855
      277dea78
  10. 14 10月, 2015 1 次提交
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  11. 10 10月, 2015 1 次提交
    • S
      Pass column family ID to table property collector · 776bd8d5
      sdong 提交于
      Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.
      
      Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.
      
      Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48411
      776bd8d5
  12. 08 10月, 2015 1 次提交
    • I
      Compaction filter on merge operands · d80ce7f9
      Igor Canadi 提交于
      Summary:
      Since Andres' internship is over, I took over https://reviews.facebook.net/D42555 and rebased and simplified it a bit.
      
      The behavior in this diff is a bit simpler than in D42555:
      * only merge operators are passed through FilterMergeValue(). If fitler function returns true, the merge operator is ignored
      * compaction filter is *not* called on: 1) results of merge operations and 2) base values that are getting merged with merge operands (the second case was also true in previous diff)
      
      Do we also need a compaction filter to get called on merge results?
      
      Test Plan: make && make check
      
      Reviewers: lovro, tnovak, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: noetzli, kolmike, leveldb, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D47847
      d80ce7f9
  13. 26 9月, 2015 1 次提交
  14. 18 9月, 2015 1 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
  15. 11 9月, 2015 1 次提交
    • A
      Refactored common code of Builder/CompactionJob out into a CompactionIterator · 8aa1f151
      Andres Noetzli 提交于
      Summary:
      Builder and CompactionJob share a lot of fairly complex code. This patch
      refactors this code into a separate class, the CompactionIterator. Because the
      shared code is fairly complex, this patch hopefully improves maintainability.
      While there are is a lot of potential for further improvements, the patch is
      intentionally pretty close to the original structure because the change is
      already complex enough.
      
      Test Plan: make clean all check && ./db_stress
      
      Reviewers: rven, anthony, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46197
      8aa1f151
  16. 09 9月, 2015 1 次提交
    • A
      Added Equal method to Comparator interface · 6bdc484f
      Andres Noetzli 提交于
      Summary:
      In some cases, equality comparisons can be done more efficiently than three-way
      comparisons. There are quite a few places in the code where we only care about
      equality. This patch adds an Equal() method that defaults to using the
      Compare() method.
      
      Test Plan: make clean all check
      
      Reviewers: rven, anthony, yhchiang, igor, sdong
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46233
      6bdc484f
  17. 03 9月, 2015 1 次提交
    • A
      Unified maps with Comparator for sorting, other cleanup · 3c9cef1e
      Andres Noetzli 提交于
      Summary:
      This diff is a collection of cleanups that were initially part of D43179.
      Additionally it adds a unified way of defining key-value maps that use a
      Comparator for sorting (this was previously implemented in four different
      places).
      
      Test Plan: make clean check all
      
      Reviewers: rven, anthony, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45993
      3c9cef1e
  18. 25 8月, 2015 1 次提交
    • I
      Smarter purging during flush · 4ab26c5a
      Igor Canadi 提交于
      Summary:
      Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks.
      
      This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point.
      
      I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature.
      
      Test Plan:
      make check
      I had to adjust some unit tests to understand this new behavior
      
      Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli
      
      Reviewed By: noetzli
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42087
      4ab26c5a
  19. 18 8月, 2015 1 次提交
    • A
      Simplify querying of merge results · f32a5720
      Andres Notzli 提交于
      Summary:
      While working on supporting mixing merge operators with
      single deletes ( https://reviews.facebook.net/D43179 ),
      I realized that returning and dealing with merge results
      can be made simpler. Submitting this as a separate diff
      because it is not directly related to single deletes.
      
      Before, callers of merge helper had to retrieve the merge
      result in one of two ways depending on whether the merge
      was successful or not (success = result of merge was single
      kTypeValue). For successful merges, the caller could query
      the resulting key/value pair and for unsuccessful merges,
      the result could be retrieved in the form of two deques of
      keys and values. However, with single deletes, a successful merge
      does not return a single key/value pair (if merge
      operands are merged with a single delete, we have to generate
      a value and keep the original single delete around to make
      sure that we are not accidentially producing a key overwrite).
      In addition, the two existing call sites of the merge
      helper were taking the same actions independently from whether
      the merge was successful or not, so this patch simplifies that.
      
      Test Plan: make clean all check
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43353
      f32a5720
  20. 15 8月, 2015 1 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
  21. 18 7月, 2015 1 次提交
    • S
      Move rate_limiter, write buffering, most perf context instrumentation and most... · 6e9fbeb2
      sdong 提交于
      Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
      
      Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
      
      Test Plan: Run all existing unit tests.
      
      Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D42321
      6e9fbeb2
  22. 14 7月, 2015 1 次提交
    • I
      Deprecate purge_redundant_kvs_while_flush · a9c51095
      Igor Canadi 提交于
      Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.
      
      Test Plan: none
      
      Reviewers: sdong, yhchiang, anthony, dhruba
      
      Reviewed By: dhruba
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42063
      a9c51095
  23. 06 6月, 2015 1 次提交
  24. 16 5月, 2015 1 次提交
    • Y
      Allow GetThreadList to report Flush properties. · 3f0867c0
      Yueh-Hsuan Chiang 提交于
      Summary:
      Allow GetThreadList to report Flush properties, which includes:
      * job id
      * number of bytes that has been written since flush started.
      * total size of input mem-tables
      
      Test Plan:
      ./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000
      
      Sample output from db_bench which tracks same flush job
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140213879898240   High Pri      default                Flush       5789 us                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 577104 | JobID 8 |
      
                ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
         140213879898240   High Pri      default                Flush     30.634 ms                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 1734865 | JobID 8 |
      
      Reviewers: rven, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38505
      3f0867c0
  25. 13 5月, 2015 1 次提交
    • I
      Add more table properties to EventLogger · dbd95b75
      Igor Canadi 提交于
      Summary:
      Example output:
      
          {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}}
      
      Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter.
      
      Test Plan: make check + check out the output in the log
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D38343
      dbd95b75
  26. 24 4月, 2015 1 次提交
  27. 07 4月, 2015 1 次提交
    • S
      A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge · 953a885e
      sdong 提交于
      Summary:
      Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.
      
      Also refactor the codes so that
      (1) make table property collector and internal table property collector two separate data structures with the later one now exposed
      (2) table builders only receive internal table properties
      
      Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.
      
      Reviewers: yhchiang, igor.sugak, rven, igor
      
      Reviewed By: rven, igor
      
      Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D35373
      953a885e
  28. 27 2月, 2015 1 次提交
    • S
      Add columnfamily option optimize_filters_for_hits to optimize for key hits only · e7c434c3
      Sameet Agarwal 提交于
      Summary:
          Summary:
          Added a new option to ColumnFamllyOptions  - optimize_filters_for_hits. This option can be used in the case where most
          accesses to the store are key hits and we dont need to optimize performance for key misses.
          This is useful when you have a very large database and most of your lookups succeed.  The option allows the store to
           not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
           space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
           for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.
      
          This option is only provided for BlockBasedTable. We skip the filters when we are compacting
      
      Test Plan:
      1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
       2. Added another unit test to db_test which specifically tests that filters are being skipped
      
      Reviewers: rven, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D33717
      e7c434c3
  29. 01 11月, 2014 1 次提交
    • I
      Turn on -Wshadow · 9f7fc3ac
      Igor Canadi 提交于
      Summary:
      ...and fix all the errors :)
      
      Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
      
      Test Plan: compiles
      
      Reviewers: yhchiang, rven, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27711
      9f7fc3ac
  30. 05 9月, 2014 1 次提交
    • L
      introduce ImmutableOptions · 5665e5e2
      Lei Jin 提交于
      Summary:
      As a preparation to support updating some options dynamically, I'd like
      to first introduce ImmutableOptions, which is a subset of Options that
      cannot be changed during the course of a DB lifetime without restart.
      
      ColumnFamily will keep both Options and ImmutableOptions. Any component
      below ColumnFamily should only take ImmutableOptions in their
      constructor. Other options should be taken from APIs, which will be
      allowed to adjust dynamically.
      
      I am yet to make changes to memtable and other related classes to take
      ImmutableOptions in their ctor. That can be done in a seprate diff as
      this one is already pretty big.
      
      Test Plan: make all check
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D22545
      5665e5e2
  31. 01 8月, 2014 1 次提交
    • Y
      Remove a check for merge operator in builder.cc · 67dae255
      Yueh-Hsuan Chiang 提交于
      Summary:
      Previously, builder.cc has a check for merge operator which prevents
      RocksDB from crash when reopening a DB w/o properly specifying the merge
      operator.  However, currently we observed a memory leak on failing in
      RocksDB recovery.  This diff removes such check and let it crash instead of
      causing memory leak for now before we have identified the real cause of
      the memory leak.
      
      Test Plan: make all check
      
      Reviewers: sdong
      
      Subscribers: ljin, igor
      
      Differential Revision: https://reviews.facebook.net/D20913
      67dae255
  32. 31 7月, 2014 1 次提交
  33. 09 7月, 2014 1 次提交
    • L
      integrate rate limiter into rocksdb · 534357ca
      Lei Jin 提交于
      Summary:
      Add option and plugin rate limiter for PosixWritableFile. The rate
      limiter only applies to flush and compaction. WAL and MANIFEST are
      excluded from this enforcement.
      
      Test Plan: db_test
      
      Reviewers: igor, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19425
      534357ca
  34. 03 7月, 2014 1 次提交
  35. 17 6月, 2014 1 次提交
    • S
      Refactor: group metadata needed to open an SST file to a separate copyable struct · cadc1adf
      sdong 提交于
      Summary:
      We added multiple fields to FileMetaData recently and are planning to add more.
      This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements:
      
      (1) use it to design a more efficient data structure to speed up read queries.
      (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data.
      
      The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D18933
      cadc1adf
  36. 25 3月, 2014 1 次提交
    • Y
      Enhance partial merge to support multiple arguments · cda4006e
      Yueh-Hsuan Chiang 提交于
      Summary:
      * PartialMerge api now takes a list of operands instead of two operands.
      * Add min_pertial_merge_operands to Options, indicating the minimum
        number of operands to trigger partial merge.
      * This diff is based on Schalk's previous diff (D14601), but it also
        includes necessary changes such as updating the pure C api for
        partial merge.
      
      Test Plan:
      * make check all
      * develop tests for cases where partial merge takes more than two
        operands.
      
      TODOs (from Schalk):
      * Add test with min_partial_merge_operands > 2.
      * Perform benchmarks to measure the performance improvements (can probably
        use results of task #2837810.)
      * Add description of problem to doc/index.html.
      * Change wiki pages to reflect the interface changes.
      
      Reviewers: haobo, igor, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D16815
      cda4006e
  37. 04 2月, 2014 2 次提交
  38. 03 2月, 2014 1 次提交