1. 05 8月, 2015 2 次提交
  2. 30 7月, 2015 1 次提交
    • A
      WriteBatch Save Points · 8161bdb5
      agiardullo 提交于
      Summary:
      Support RollbackToSavePoint() in WriteBatch and WriteBatchWithIndex.  Support for partial transaction rollback is needed for MyRocks.
      
      An alternate implementation of Transaction::RollbackToSavePoint() exists in D40869.  However, the other implementation is messier because it is implemented outside of WriteBatch.  This implementation is much cleaner and also exposes a potentially useful feature to WriteBatch.
      
      Test Plan: Added unit tests
      
      Reviewers: IslamAbdelRahman, kradhakrishnan, maykov, yoshinorim, hermanlee4, spetrunia, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42723
      8161bdb5
  3. 21 7月, 2015 1 次提交
    • A
      Improved FileExists API · 06429408
      agiardullo 提交于
      Summary: Add new CheckFileExists method.  Considered changing the FileExists api but didn't want to break anyone's builds.
      
      Test Plan: unit tests
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42003
      06429408
  4. 18 7月, 2015 1 次提交
    • I
      Deprecate CompactionFilterV2 · a96fcd09
      Igor Canadi 提交于
      Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature.
      
      Test Plan: make check
      
      Reviewers: noetzli, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42405
      a96fcd09
  5. 15 7月, 2015 1 次提交
    • I
      Better error handling in BackupEngine · 8a9fca26
      Igor Canadi 提交于
      Summary:
      Couple of changes here:
      * NewBackupEngine() and NewReadOnlyBackupEngine() are now removed. They were deprecated since RocksDB 3.8. Changing these to new functions should be pretty straight-forward. As a followup, I'll fix all fbcode callsights
      * Instead of initializing backup engine in the constructor, we initialize it in a separate function now. That way, we can catch all errors and return appropriate status code.
      * We catch all errors during initializations and return them to the client properly.
      * Added new tests to backupable_db_test, to make sure that we can't open BackupEngine when there are Env errors.
      * Transitioned backupable_db_test to use BackupEngine rather than BackupableDB. From the two available APIs, judging by the current use-cases, it looks like BackupEngine API won. It's much more flexible since it doesn't require StackableDB.
      
      Test Plan: Added a new unit test to backupable_db_test
      
      Reviewers: yhchiang, sdong, AaronFeldman
      
      Reviewed By: AaronFeldman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41925
      8a9fca26
  6. 14 7月, 2015 2 次提交
    • I
      Deprecate purge_redundant_kvs_while_flush · a9c51095
      Igor Canadi 提交于
      Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.
      
      Test Plan: none
      
      Reviewers: sdong, yhchiang, anthony, dhruba
      
      Reviewed By: dhruba
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42063
      a9c51095
    • I
      Deprecate WriteOptions::timeout_hint_us · 5aea98dd
      Igor Canadi 提交于
      Summary:
      In one of our recent meetings, we discussed deprecating features that are not being actively used. One of those features, at least within Facebook, is timeout_hint. The feature is really nicely implemented, but if nobody needs it, we should remove it from our code-base (until we get a valid use-case). Some arguments:
      * Less code == better icache hit rate, smaller builds, simpler code
      * The motivation for adding timeout_hint_us was to work-around RocksDB's stall issue. However, we're currently addressing the stall issue itself (see @sdong's recent work on stall write_rate), so we should never see sharp lock-ups in the future.
      * Nobody is using the feature within Facebook's code-base. Googling for `timeout_hint_us` also doesn't yield any users.
      
      Test Plan: make check
      
      Reviewers: anthony, kradhakrishnan, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: sdong, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41937
      5aea98dd
  7. 08 7月, 2015 2 次提交
  8. 03 7月, 2015 2 次提交
    • A
      Prepare 3.12 · 4159f5b8
      agiardullo 提交于
      Summary: About to cut release
      
      Test Plan: none
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41061
      4159f5b8
    • A
      Multithreaded backup and restore in BackupEngineImpl · a69bc91e
      Aaron Feldman 提交于
      Summary:
      Add a new field: BackupableDBOptions.max_background_copies.
      CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies.
      If there is a backup rate limit, then max_background_copies must be 1.
      Update backupable_db_test.cc to test multi-threaded backup and restore.
      Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment.
      
      Test Plan:
      Run ./backupable_db_test
      Run valgrind ./backupable_db_test
      Run with TSAN and ASAN
      
      Reviewers: yhchiang, rven, anthony, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yhchiang, anthony, sdong, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40725
      a69bc91e
  9. 27 6月, 2015 1 次提交
    • I
      Use malloc_usable_size() for accounting block cache size · 0a019d74
      Igor Canadi 提交于
      Summary:
      Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB!
      
      This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected.
      
      This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory.
      
      I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases.
      
      Test Plan:
      1. fill up mongo instance with 80GB of data
      2. restart mongo with block cache size configured to 10GB
      3. do a table scan in mongo
      4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40635
      0a019d74
  10. 24 6月, 2015 2 次提交
    • I
      Bottommost level compaction option · 674b1181
      Islam AbdelRahman 提交于
      Summary: Replace force_bottommost_level_compaction in CompactRangeOption with an option that allow the user to (always skip, always compact, compact if compaction filter is present) the bottommost level for level based compaction.
      
      Test Plan: make check
      
      Reviewers: sdong, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40527
      674b1181
    • G
      Implement a table-level row cache · 782a1590
      Giuseppe Ottaviano 提交于
      Summary:
      Implementation of a table-level row cache.
      It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache.
      
      Supports snapshots and merge operations.
      
      Test Plan: Ran `make valgrind_check commit-prereq`
      
      Reviewers: igor, philipp, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39849
      782a1590
  11. 19 6月, 2015 3 次提交
    • I
      Fail DB::Open() when the requested compression is not available · 760e9a94
      Igor Canadi 提交于
      Summary:
      Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users.
      
      This patch fails DB::Open() if we don't support the compression that is specified in the options.
      
      Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails.
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39687
      760e9a94
    • A
      Add Cache.GetPinnedUsageUsage() · 69bb210d
      Aaron Feldman 提交于
      Summary:
        Add the funcion Cache.GetPinnedUsage() to return the memory size of entries
        that are in use by the system (that is, all the entries not in the LRU list).
      
      Test Plan:
        Run ./cache_test and examine PinnedUsageTest.
      
      Reviewers: tnovak, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40305
      69bb210d
    • I
      Skip bottommost level compaction if possible · 4eabbdb7
      Islam AbdelRahman 提交于
      Summary:
      This is https://reviews.facebook.net/D39999 but after introducing an option to force compaction the bottom most level
      
      Changes in this patch
      - Introduce force_bottommost_level_compaction to CompactRangeOptions that force compacting bottommost level during compaction
      - Skip bottommost level compaction if we dont have a compaction filter and force_bottommost_level_compaction options is not set
      
      Although tests pass on my machine but I suspect that there maybe some tests that I am not aware of that  should use force_bottommost_level_compaction to pass in a deterministic way
      
      Test Plan:
      make check
      adding new tests
      
      Reviewers: igor, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40059
      4eabbdb7
  12. 18 6月, 2015 1 次提交
    • I
      Use CompactRangeOptions for CompactRange · 12e030a9
      Islam AbdelRahman 提交于
      Summary:
      This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters
      Old CompactRange is still available but deprecated
      
      Test Plan:
      make all check
      make rocksdbjava
      USE_CLANG=1 make all
      OPT=-DROCKSDB_LITE make release
      
      Reviewers: sdong, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40209
      12e030a9
  13. 17 6月, 2015 1 次提交
  14. 13 6月, 2015 1 次提交
    • I
      db_bench periodically writes QPS to CSV file · d59d90bb
      Igor Canadi 提交于
      Summary:
      This is part of an effort to better understand and optimize RocksDB stalls under high load. I added a feature to db_bench to periodically write QPS to CSV files. That way we can nicely see how our QPS changes in time (especially when DB is stalled) and can do a better job of evaluating our stall system (i.e. we want the QPS to be as constant as possible, as opposed to having bunch of stalls)
      
      Cool part of CSV files is that we can easily graph them -- there are a bunch of tools available.
      
      Test Plan:
      Ran ./db_bench --report_interval_seconds=10 --benchmarks=fillrandom --num=10000000
      and observed this in report.csv:
      
      secs_elapsed,interval_qps
      10,2725860
      20,1980480
      30,1863456
      40,1454359
      50,1460389
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40047
      d59d90bb
  15. 12 6月, 2015 1 次提交
    • S
      Slow down writes by bytes written · 7842920b
      sdong 提交于
      Summary:
      We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch.
      
      The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work
      
      hard_rate_limit is deprecated.
      
      options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up.
      
      Test Plan: Add new unit tests in db_test
      
      Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor
      
      Reviewed By: igor
      
      Subscribers: ikabiljo, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D36351
      7842920b
  16. 11 6月, 2015 3 次提交
    • I
      Re-generate WriteEntry on WBWIIterator::Entry() · 821cff11
      Igor Canadi 提交于
      Summary:
      [This is the resubmit of D39813. Tests were failing, so I reverted the diff. I found the bug and I'm now resubmitting]
      
      If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).
      
      Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39897
      821cff11
    • I
      Revert "Fix compile" · 75222d13
      Igor Canadi 提交于
      This reverts commit 51440f83.
      
      Revert "Re-generate WriteEntry on WBWIIterator::Entry()"
      
      This reverts commit 4949ef08.
      75222d13
    • I
      Re-generate WriteEntry on WBWIIterator::Entry() · 4949ef08
      Igor Canadi 提交于
      Summary: If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).
      
      Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39813
      4949ef08
  17. 10 6月, 2015 1 次提交
    • V
      Fix hang when closing a DB after doing loads with WAL disabled. · 406a5682
      Venkatesh Radhakrishnan 提交于
      Summary:
      There is a hang during DB close in the following scenario:
      a) a load with WAL disabled was done,
      b) CancelAllBackgroundWork was called,
      c) DB Close was called
      This was because in that we will wait for a flush but we cannot do a
      background flush because we have called CancelAllBackgroundWork which
      marks the DB as shutting downn.
      
      Test Plan: Added DBTest FlushOnDestroy
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, hermanlee4, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39747
      406a5682
  18. 09 6月, 2015 1 次提交
    • I
      Use nullptr for default compaction_filter_factory · 643bbbf0
      Islam AbdelRahman 提交于
      Summary:
      Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2
      The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI
      
      Test Plan: make check
      
      Reviewers: yoshinorim, ott, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39693
      643bbbf0
  19. 06 6月, 2015 1 次提交
  20. 02 6月, 2015 1 次提交
  21. 30 5月, 2015 1 次提交
    • A
      Optimistic Transactions · dc9d70de
      agiardullo 提交于
      Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.
      
      Test Plan: Added a new test, but still need to look into stress testing.
      
      Reviewers: yhchiang, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D33435
      dc9d70de
  22. 29 5月, 2015 2 次提交
    • A
      Support saving history in memtable_list · c8153510
      agiardullo 提交于
      Summary:
      For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
      
      After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
      
      This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
      
      However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
      
      Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37443
      c8153510
    • Y
      [API Change] Move listeners from ColumnFamilyOptions to DBOptions · 672dda9b
      Yueh-Hsuan Chiang 提交于
      Summary: Move listeners from ColumnFamilyOptions to DBOptions
      
      Test Plan:
      listener_test
      compact_files_test
      
      Reviewers: rven, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39087
      672dda9b
  23. 22 5月, 2015 2 次提交
  24. 20 5月, 2015 2 次提交
  25. 19 5月, 2015 1 次提交
  26. 25 4月, 2015 1 次提交
    • A
      Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible · 794ccfde
      Aashish Pant 提交于
      Summary:
      When new capacity is larger than existing capacity, simply update the capacity to the new valie
      When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
      When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity
      
      Test Plan: Created unit tests in cache_test.cc
      
      Reviewers: sdong, rven, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D37527
      794ccfde
  27. 07 4月, 2015 1 次提交
    • S
      A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge · 953a885e
      sdong 提交于
      Summary:
      Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.
      
      Also refactor the codes so that
      (1) make table property collector and internal table property collector two separate data structures with the later one now exposed
      (2) table builders only receive internal table properties
      
      Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.
      
      Reviewers: yhchiang, igor.sugak, rven, igor
      
      Reviewed By: rven, igor
      
      Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D35373
      953a885e
  28. 31 3月, 2015 1 次提交
    • S
      Universal Compactions with Small Files · b23bbaa8
      sdong 提交于
      Summary:
      With this change, we use L1 and up to store compaction outputs in universal compaction.
      The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.
      
      If options.num_levels=1, it behaves all the same as now.
      
      Test Plan:
      1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
      2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
      3) add unit test to migrate from multiple level setting back to one level setting
      4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.
      
      Reviewers: rven, kradhakrishnan, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: meyering, leveldb, MarkCallaghan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D34539
      b23bbaa8