1. 11 9月, 2014 1 次提交
    • I
      Push model for flushing memtables · 3d9e6f77
      Igor Canadi 提交于
      Summary:
      When memtable is full it calls the registered callback. That callback then registers column family as needing the flush. Every write checks if there are some column families that need to be flushed. This completely eliminates the need for MakeRoomForWrite() function and simplifies our Write code-path.
      
      There is some complexity with the concurrency when the column family is dropped. I made it a bit less complex by dropping the column family from the write thread in https://reviews.facebook.net/D22965. Let me know if you want to discuss this.
      
      Test Plan: make check works. I'll also run db_stress with creating and dropping column families for a while.
      
      Reviewers: yhchiang, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23067
      3d9e6f77
  2. 09 9月, 2014 1 次提交
    • L
      MemTableOptions · 52311463
      Lei Jin 提交于
      Summary: removed reference to options in WriteBatch and DBImpl::Get()
      
      Test Plan: make all check
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23049
      52311463
  3. 03 9月, 2014 1 次提交
    • I
      Ignore missing column families · a84234a6
      Igor Canadi 提交于
      Summary:
      Before this diff, whenever we Write to non-existing column family, Write() would fail.
      
      This diff adds an option to not fail a Write() when WriteBatch points to non-existing column family. MongoDB said this would be useful for them, since they might have a transaction updating an index that was dropped by another thread. This way, they don't have to worry about checking if all indexes are alive on every write. They don't care if they lose writes to dropped index.
      
      Test Plan: added a small unit test
      
      Reviewers: sdong, yhchiang, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22143
      a84234a6
  4. 19 8月, 2014 1 次提交
    • S
      WriteBatchWithIndex: a wrapper of WriteBatch, with a searchable index · 28b5c760
      sdong 提交于
      Summary:
      Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write.
      
      WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset.
      
      Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries.
      
      I will add more unit tests if people are OK about this API.
      
      Test Plan:
      make all check
      Add unit tests.
      
      Reviewers: yhchiang, igor, MarkCallaghan, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D21381
      28b5c760
  5. 29 7月, 2014 1 次提交
    • L
      make statistics forward-able · 40fa8a4c
      Lei Jin 提交于
      Summary:
      Make StatisticsImpl being able to forward stats to provided statistics
      implementation. The main purpose is to allow us to collect internal
      stats in the future even when user supplies custom statistics
      implementation. It avoids intrumenting 2 sets of stats collection code.
      One immediate use case is tuning advisor, which needs to collect some
      internal stats, users may not be interested.
      
      Test Plan:
      ran db_bench and see stats show up at the end of run
      Will run make all check since some tests rely on statistics
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D20145
      40fa8a4c
  6. 11 7月, 2014 1 次提交
    • I
      JSON (Document) API sketch · f0a8be25
      Igor Canadi 提交于
      Summary:
      This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API.
      
      I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :)
      
      Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time.
      
      Test Plan: Added a simple unit test
      
      Reviewers: haobo, yhchiang, dhruba, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18747
      f0a8be25
  7. 23 4月, 2014 1 次提交
    • I
      Support for column families in TTL DB · 3992aec8
      Igor Canadi 提交于
      Summary:
      This will enable people using TTL DB to do so with multiple column families. They can also specify different TTLs for each one.
      
      TODO: Implement CreateColumnFamily() in TTL world.
      
      Test Plan: Added a very simple sanity test.
      
      Reviewers: dhruba, haobo, ljin, sdong, yhchiang
      
      Reviewed By: haobo
      
      CC: leveldb, alberts
      
      Differential Revision: https://reviews.facebook.net/D17859
      3992aec8
  8. 15 3月, 2014 1 次提交
  9. 14 3月, 2014 1 次提交
  10. 13 3月, 2014 1 次提交
    • I
      [CF] Code cleanup part 1 · fb2346fc
      Igor Canadi 提交于
      Summary:
      I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup.
      
      Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch.
      
      This diff also fixes race condition when dropping column family.
      
      Test Plan: Ran db_stress with variety of parameters
      
      Reviewers: dhruba, haobo
      
      Differential Revision: https://reviews.facebook.net/D16833
      fb2346fc
  11. 04 3月, 2014 1 次提交
    • I
      [CF] Fix CF bugs in WriteBatch · f9b2f0ad
      Igor Canadi 提交于
      Summary:
      This diff fixes two bugs:
      * Increase sequence number even if WriteBatch fails. This is important because WriteBatches in WAL logs have implictly increasing sequence number, even if one update in a write batch fails. This caused some writes to get lost in my CF stress testing
      * Tolerate 'invalid column family' errors on recovery. When a column family is dropped, processing WAL logs can have some WriteBatches that still refer to the dropped column family. In recovery environment, we want to ignore those errors. In client's Write() code path, however, we want to return the failure to the client if he's trying to add data to invalid column family.
      
      Test Plan: db_stress's verification works now
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16533
      f9b2f0ad
  12. 27 2月, 2014 1 次提交
    • I
      [CF] Handle failure in WriteBatch::Handler · 8b7ab995
      Igor Canadi 提交于
      Summary:
      * Add ColumnFamilyHandle::GetID() function. Client needs to know column family's ID to be able to construct WriteBatch
      * Handle WriteBatch::Handler failure gracefully. Since WriteBatch is not a very smart function (it takes raw CF id), client can add data to WriteBatch for column family that doesn't exist. In that case, we need to gracefully return failure status from DB::Write(). To do that, I added a return Status to WriteBatch functions PutCF, DeleteCF and MergeCF.
      
      Test Plan: Added test to column_family_test
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16323
      8b7ab995
  13. 13 2月, 2014 1 次提交
    • I
      [CF] Rethinking ColumnFamilyHandle and fix to dropping column families · b06840aa
      Igor Canadi 提交于
      Summary:
      The change to the public behavior:
      * When opening a DB or creating new column family client gets a ColumnFamilyHandle.
      * As long as column family handle is alive, client can do whatever he wants with it, even drop it
      * Dropped column family can still be read from (using the column family handle)
      * Added a new call CloseColumnFamily(). Client has to close all column families that he has opened before deleting the DB
      * As soon as column family is closed, any calls to DB using that column family handle will fail (also any outstanding calls)
      
      Internally:
      * Ref-counting ColumnFamilyData
      * New thread-safety for ColumnFamilySet
      * Dropped column families are now completely dropped and their memory cleaned-up
      
      Test Plan: added some tests to column_family_test
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16101
      b06840aa
  14. 07 2月, 2014 1 次提交
    • I
      [CF] Propagate correct options to WriteBatch::InsertInto · 8fa8a708
      Igor Canadi 提交于
      Summary:
      WriteBatch can have multiple column families in one batch. Every column family has different options. So we have to add a way for write batch to get options for an arbitrary column family.
      
      This required a bit more acrobatics since lots of interfaces had to be changed.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15957
      8fa8a708
  15. 30 1月, 2014 2 次提交
    • I
      Fix some lint warnings · 514e42c7
      Igor Canadi 提交于
      514e42c7
    • I
      Read from and write to different column families · f24a3ee5
      Igor Canadi 提交于
      Summary: This one is big. It adds ability to write to and read from different column families (see the unit test). It also supports recovery of different column families from log, which was the hardest part to reason about. We need to make sure to never delete the log file which has unflushed data from any column family. To support that, I added another concept, which is versions_->MinLogNumber()
      
      Test Plan: Added a unit test in column_family_test
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15537
      f24a3ee5
  16. 18 1月, 2014 1 次提交
  17. 17 1月, 2014 1 次提交
    • N
      Allow callback to change size of existing value. Change return type of the... · 1447bb59
      Naman Gupta 提交于
      Allow callback to change size of existing value. Change return type of the callback function to an enum status to handle 3 cases.
      
      Summary:
      This diff fixes 2 hacks:
      * The callback function can modify the existing value inplace, if the merged value fits within the existing buffer size. But currently the existing buffer size is not being modified. Now the callback recieves a int* allowing the size to be modified. Since size is encoded as a varint in the internal key for memtable. It might happen that the entire value might have be copied to the new location if the new size varint is smaller than the existing size varint.
      * The callback function has 3 functionalities
          1. Modify existing buffer inplace, and update size correspondingly. Now to indicate that, Returns 1.
          2. Generate a new buffer indicating merged value. Returns 2.
          3. Fails to do either of above, based on whatever application logic. Returns 0.
      
      Test Plan: Just make all for now. I'm adding another unit test to test each scenario.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb, sdong, kailiu, xinyaohu, sumeet, danguo
      
      Differential Revision: https://reviews.facebook.net/D15195
      1447bb59
  18. 15 1月, 2014 2 次提交
    • S
      DB::Put() to estimate write batch data size needed and pre-allocate buffer · 9ea8bf90
      Siying Dong 提交于
      Summary:
      In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand.
      
      Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced.
      
      Test Plan: make all check
      
      Reviewers: haobo, kailiu, igor
      
      Reviewed By: haobo
      
      CC: leveldb, nkg-
      
      Differential Revision: https://reviews.facebook.net/D15135
      9ea8bf90
    • S
      DB::Put() to estimate write batch data size needed and pre-allocate buffer · 51dd2192
      Siying Dong 提交于
      Summary:
      In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand.
      
      Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced.
      
      Test Plan: make all check
      
      Reviewers: haobo, kailiu, igor
      
      Reviewed By: haobo
      
      CC: leveldb, nkg-
      
      Differential Revision: https://reviews.facebook.net/D15135
      51dd2192
  19. 14 1月, 2014 1 次提交
    • N
      Add read/modify/write functionality to Put() api · 8454cfe5
      Naman Gupta 提交于
      Summary: The application can set a callback function, which is applied on the previous value. And calculates the new value. This new value can be set, either inplace, if the previous value existed in memtable, and new value is smaller than previous value. Otherwise the new value is added normally.
      
      Test Plan: fbmake. Added unit tests. All unit tests pass.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: haobo
      
      CC: sdong, kailiu, xinyaohu, sumeet, leveldb
      
      Differential Revision: https://reviews.facebook.net/D14745
      8454cfe5
  20. 11 1月, 2014 1 次提交
    • S
      Improve RocksDB "get" performance by computing merge result in memtable · a09ee106
      Schalk-Willem Kruger 提交于
      Summary:
      Added an option (max_successive_merges) that can be used to specify the
      maximum number of successive merge operations on a key in the memtable.
      This can be used to improve performance of the "get" operation. If many
      successive merge operations are performed on a key, the performance of "get"
      operations on the key deteriorates, as the value has to be computed for each
      "get" operation by applying all the successive merge operations.
      
      FB Task ID: #3428853
      
      Test Plan:
      make all check
      db_bench --benchmarks=readrandommergerandom
      counter_stress_test
      
      Reviewers: haobo, vamsi, dhruba, sdong
      
      Reviewed By: haobo
      
      CC: zshao
      
      Differential Revision: https://reviews.facebook.net/D14991
      a09ee106
  21. 09 1月, 2014 1 次提交
    • I
      Add column family information to WAL · 19e3ee64
      Igor Canadi 提交于
      Summary:
      I have added three new value types:
      * kTypeColumnFamilyDeletion
      * kTypeColumnFamilyValue
      * kTypeColumnFamilyMerge
      which include column family Varint32 before the data (value, deletion and merge). These values are used only in WAL (not in memtables yet).
      
      This endeavour required changing some WriteBatch internals.
      
      Test Plan: Added a unittest
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15045
      19e3ee64
  22. 21 12月, 2013 2 次提交
  23. 19 12月, 2013 1 次提交
    • I
      [RocksDB] [Column Family] Interface proposal · 9385a524
      Igor Canadi 提交于
      Summary:
      <This diff is for Column Family branch>
      
      Sharing some of the work I've done so far. This diff compiles and passes the tests.
      
      The biggest change is in options.h - I broke down Options into two parts - DBOptions and ColumnFamilyOptions. DBOptions is DB-specific (env, create_if_missing, block_cache, etc.) and ColumnFamilyOptions is column family-specific (all compaction options, compresion options, etc.). Note that this does not break backwards compatibility at all.
      
      Further, I created DBWithColumnFamily which inherits DB interface and adds new functions with column family support. Clients can transparently switch to DBWithColumnFamily and it will not break their backwards compatibility.
      There are few methods worth checking out: ListColumnFamilies(), MultiNewIterator(), MultiGet() and GetSnapshot(). [GetSnapshot() returns the snapshot across all column families for now - I think that's what we agreed on]
      
      Finally, I made small changes to WriteBatch so we are able to atomically insert data across column families.
      
      Please provide feedback.
      
      Test Plan: make check works, the code is backward compatible
      
      Reviewers: dhruba, haobo, sdong, kailiu, emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14445
      9385a524
  24. 26 11月, 2013 1 次提交
  25. 09 11月, 2013 1 次提交
    • L
      WriteBatch::Put() overload that gathers key and value from arrays of slices · 8a46ecd3
      lovro 提交于
      Summary: In our project, when writing to the database, we want to form the value as the concatenation of a small header and a larger payload.  It's a shame to have to copy the payload just so we can give RocksDB API a linear view of the value.  Since RocksDB makes a copy internally, it's easy to support gather writes.
      
      Test Plan: write_batch_test, new test case
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13947
      8a46ecd3
  26. 01 11月, 2013 1 次提交
    • N
      In-place updates for equal keys and similar sized values · fe250702
      Naman Gupta 提交于
      Summary:
      Currently for each put, a fresh memory is allocated, and a new entry is added to the memtable with a new sequence number irrespective of whether the key already exists in the memtable. This diff is an attempt to update the value inplace for existing keys. It currently handles a very simple case:
      1. Key already exists in the current memtable. Does not inplace update values in immutable memtable or snapshot
      2. Latest value type is a 'put' ie kTypeValue
      3. New value size is less than existing value, to avoid reallocating memory
      
      TODO: For a put of an existing key, deallocate memory take by values, for other value types till a kTypeValue is found, ie. remove kTypeMerge.
      TODO: Update the transaction log, to allow consistent reload of the memtable.
      
      Test Plan: Added a unit test verifying the inplace update. But some other unit tests broken due to invalid sequence number checks. WIll fix them next.
      
      Reviewers: xinyaohu, sumeet, haobo, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12423
      
      Automatic commit by arc
      fe250702
  27. 17 10月, 2013 1 次提交
  28. 05 10月, 2013 1 次提交
  29. 24 8月, 2013 1 次提交
  30. 22 8月, 2013 1 次提交
    • J
      Allow WriteBatch::Handler to abort iteration · cb703c9d
      Jim Paton 提交于
      Summary:
      Sometimes you don't need to iterate through the whole WriteBatch. This diff makes the Handler member functions return a bool that indicates whether to abort or not. If they return true, the iteration stops.
      
      One thing I just thought of is that this will break backwards-compability. Maybe it would be better to add a virtual member function WriteBatch::Handler::ShouldAbort() that returns false by default. Comments requested.
      
      I still have to add a new unit test for the abort code, but let's finalize the API first.
      
      Test Plan: make -j32 check
      
      Reviewers: dhruba, haobo, vamsi, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12339
      cb703c9d
  31. 15 8月, 2013 1 次提交
    • J
      Implement log blobs · 0307c5fe
      Jim Paton 提交于
      Summary:
      This patch adds the ability for the user to add sequences of arbitrary data (blobs) to write batches. These blobs are saved to the log along with everything else in the write batch. You can add multiple blobs per WriteBatch and the ordering of blobs, puts, merges, and deletes are preserved.
      
      Blobs are not saves to SST files. RocksDB ignores blobs in every way except for writing them to the log.
      
      Before committing this patch, I need to add some test code. But I'm submitting it now so people can comment on the API.
      
      Test Plan: make -j32 check
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12195
      0307c5fe
  32. 02 8月, 2013 1 次提交
    • M
      Expand KeyMayExist to return the proper value if it can be found in memory and... · 59d0b02f
      Mayank Agarwal 提交于
      Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache
      
      Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
      
      Test Plan: make all check;db_stress for 1 hour
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11853
      59d0b02f
  33. 24 7月, 2013 1 次提交
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  34. 27 6月, 2013 1 次提交
  35. 04 5月, 2013 1 次提交
    • H
      [Rocksdb] Support Merge operation in rocksdb · 05e88540
      Haobo Xu 提交于
      Summary:
      This diff introduces a new Merge operation into rocksdb.
      The purpose of this review is mostly getting feedback from the team (everyone please) on the design.
      
      Please focus on the four files under include/leveldb/, as they spell the client visible interface change.
      include/leveldb/db.h
      include/leveldb/merge_operator.h
      include/leveldb/options.h
      include/leveldb/write_batch.h
      
      Please go over local/my_test.cc carefully, as it is a concerete use case.
      
      Please also review the impelmentation files to see if the straw man implementation makes sense.
      
      Note that, the diff does pass all make check and truly supports forward iterator over db and a version
      of Get that's based on iterator.
      
      Future work:
      - Integration with compaction
      - A raw Get implementation
      
      I am working on a wiki that explains the design and implementation choices, but coding comes
      just naturally and I think it might be a good idea to share the code earlier. The code is
      heavily commented.
      
      Test Plan: run all local tests
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D9651
      05e88540
  36. 09 3月, 2012 1 次提交
  37. 01 11月, 2011 1 次提交
    • H
      A number of fixes: · 36a5f8ed
      Hans Wennborg 提交于
      - Replace raw slice comparison with a call to user comparator.
        Added test for custom comparators.
      
      - Fix end of namespace comments.
      
      - Fixed bug in picking inputs for a level-0 compaction.
      
        When finding overlapping files, the covered range may expand
        as files are added to the input set.  We now correctly expand
        the range when this happens instead of continuing to use the
        old range.  For example, suppose L0 contains files with the
        following ranges:
      
            F1: a .. d
            F2:    c .. g
            F3:       f .. j
      
        and the initial compaction target is F3.  We used to search
        for range f..j which yielded {F2,F3}.  However we now expand
        the range as soon as another file is added.  In this case,
        when F2 is added, we expand the range to c..j and restart the
        search.  That picks up file F1 as well.
      
        This change fixes a bug related to deleted keys showing up
        incorrectly after a compaction as described in Issue 44.
      
      (Sync with upstream @25072954)
      36a5f8ed