1. 08 7月, 2015 1 次提交
  2. 03 7月, 2015 2 次提交
    • A
      Prepare 3.12 · 4159f5b8
      agiardullo 提交于
      Summary: About to cut release
      
      Test Plan: none
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41061
      4159f5b8
    • A
      Multithreaded backup and restore in BackupEngineImpl · a69bc91e
      Aaron Feldman 提交于
      Summary:
      Add a new field: BackupableDBOptions.max_background_copies.
      CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies.
      If there is a backup rate limit, then max_background_copies must be 1.
      Update backupable_db_test.cc to test multi-threaded backup and restore.
      Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment.
      
      Test Plan:
      Run ./backupable_db_test
      Run valgrind ./backupable_db_test
      Run with TSAN and ASAN
      
      Reviewers: yhchiang, rven, anthony, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yhchiang, anthony, sdong, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40725
      a69bc91e
  3. 27 6月, 2015 1 次提交
    • I
      Use malloc_usable_size() for accounting block cache size · 0a019d74
      Igor Canadi 提交于
      Summary:
      Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB!
      
      This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected.
      
      This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory.
      
      I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases.
      
      Test Plan:
      1. fill up mongo instance with 80GB of data
      2. restart mongo with block cache size configured to 10GB
      3. do a table scan in mongo
      4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40635
      0a019d74
  4. 24 6月, 2015 2 次提交
    • I
      Bottommost level compaction option · 674b1181
      Islam AbdelRahman 提交于
      Summary: Replace force_bottommost_level_compaction in CompactRangeOption with an option that allow the user to (always skip, always compact, compact if compaction filter is present) the bottommost level for level based compaction.
      
      Test Plan: make check
      
      Reviewers: sdong, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40527
      674b1181
    • G
      Implement a table-level row cache · 782a1590
      Giuseppe Ottaviano 提交于
      Summary:
      Implementation of a table-level row cache.
      It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache.
      
      Supports snapshots and merge operations.
      
      Test Plan: Ran `make valgrind_check commit-prereq`
      
      Reviewers: igor, philipp, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39849
      782a1590
  5. 19 6月, 2015 3 次提交
    • I
      Fail DB::Open() when the requested compression is not available · 760e9a94
      Igor Canadi 提交于
      Summary:
      Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users.
      
      This patch fails DB::Open() if we don't support the compression that is specified in the options.
      
      Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails.
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39687
      760e9a94
    • A
      Add Cache.GetPinnedUsageUsage() · 69bb210d
      Aaron Feldman 提交于
      Summary:
        Add the funcion Cache.GetPinnedUsage() to return the memory size of entries
        that are in use by the system (that is, all the entries not in the LRU list).
      
      Test Plan:
        Run ./cache_test and examine PinnedUsageTest.
      
      Reviewers: tnovak, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40305
      69bb210d
    • I
      Skip bottommost level compaction if possible · 4eabbdb7
      Islam AbdelRahman 提交于
      Summary:
      This is https://reviews.facebook.net/D39999 but after introducing an option to force compaction the bottom most level
      
      Changes in this patch
      - Introduce force_bottommost_level_compaction to CompactRangeOptions that force compacting bottommost level during compaction
      - Skip bottommost level compaction if we dont have a compaction filter and force_bottommost_level_compaction options is not set
      
      Although tests pass on my machine but I suspect that there maybe some tests that I am not aware of that  should use force_bottommost_level_compaction to pass in a deterministic way
      
      Test Plan:
      make check
      adding new tests
      
      Reviewers: igor, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40059
      4eabbdb7
  6. 18 6月, 2015 1 次提交
    • I
      Use CompactRangeOptions for CompactRange · 12e030a9
      Islam AbdelRahman 提交于
      Summary:
      This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters
      Old CompactRange is still available but deprecated
      
      Test Plan:
      make all check
      make rocksdbjava
      USE_CLANG=1 make all
      OPT=-DROCKSDB_LITE make release
      
      Reviewers: sdong, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40209
      12e030a9
  7. 17 6月, 2015 1 次提交
  8. 13 6月, 2015 1 次提交
    • I
      db_bench periodically writes QPS to CSV file · d59d90bb
      Igor Canadi 提交于
      Summary:
      This is part of an effort to better understand and optimize RocksDB stalls under high load. I added a feature to db_bench to periodically write QPS to CSV files. That way we can nicely see how our QPS changes in time (especially when DB is stalled) and can do a better job of evaluating our stall system (i.e. we want the QPS to be as constant as possible, as opposed to having bunch of stalls)
      
      Cool part of CSV files is that we can easily graph them -- there are a bunch of tools available.
      
      Test Plan:
      Ran ./db_bench --report_interval_seconds=10 --benchmarks=fillrandom --num=10000000
      and observed this in report.csv:
      
      secs_elapsed,interval_qps
      10,2725860
      20,1980480
      30,1863456
      40,1454359
      50,1460389
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40047
      d59d90bb
  9. 12 6月, 2015 1 次提交
    • S
      Slow down writes by bytes written · 7842920b
      sdong 提交于
      Summary:
      We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch.
      
      The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work
      
      hard_rate_limit is deprecated.
      
      options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up.
      
      Test Plan: Add new unit tests in db_test
      
      Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor
      
      Reviewed By: igor
      
      Subscribers: ikabiljo, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D36351
      7842920b
  10. 11 6月, 2015 3 次提交
    • I
      Re-generate WriteEntry on WBWIIterator::Entry() · 821cff11
      Igor Canadi 提交于
      Summary:
      [This is the resubmit of D39813. Tests were failing, so I reverted the diff. I found the bug and I'm now resubmitting]
      
      If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).
      
      Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39897
      821cff11
    • I
      Revert "Fix compile" · 75222d13
      Igor Canadi 提交于
      This reverts commit 51440f83.
      
      Revert "Re-generate WriteEntry on WBWIIterator::Entry()"
      
      This reverts commit 4949ef08.
      75222d13
    • I
      Re-generate WriteEntry on WBWIIterator::Entry() · 4949ef08
      Igor Canadi 提交于
      Summary: If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).
      
      Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39813
      4949ef08
  11. 10 6月, 2015 1 次提交
    • V
      Fix hang when closing a DB after doing loads with WAL disabled. · 406a5682
      Venkatesh Radhakrishnan 提交于
      Summary:
      There is a hang during DB close in the following scenario:
      a) a load with WAL disabled was done,
      b) CancelAllBackgroundWork was called,
      c) DB Close was called
      This was because in that we will wait for a flush but we cannot do a
      background flush because we have called CancelAllBackgroundWork which
      marks the DB as shutting downn.
      
      Test Plan: Added DBTest FlushOnDestroy
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, hermanlee4, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39747
      406a5682
  12. 09 6月, 2015 1 次提交
    • I
      Use nullptr for default compaction_filter_factory · 643bbbf0
      Islam AbdelRahman 提交于
      Summary:
      Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2
      The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI
      
      Test Plan: make check
      
      Reviewers: yoshinorim, ott, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39693
      643bbbf0
  13. 06 6月, 2015 1 次提交
  14. 02 6月, 2015 1 次提交
  15. 30 5月, 2015 1 次提交
    • A
      Optimistic Transactions · dc9d70de
      agiardullo 提交于
      Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.
      
      Test Plan: Added a new test, but still need to look into stress testing.
      
      Reviewers: yhchiang, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D33435
      dc9d70de
  16. 29 5月, 2015 2 次提交
    • A
      Support saving history in memtable_list · c8153510
      agiardullo 提交于
      Summary:
      For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
      
      After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
      
      This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
      
      However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
      
      Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37443
      c8153510
    • Y
      [API Change] Move listeners from ColumnFamilyOptions to DBOptions · 672dda9b
      Yueh-Hsuan Chiang 提交于
      Summary: Move listeners from ColumnFamilyOptions to DBOptions
      
      Test Plan:
      listener_test
      compact_files_test
      
      Reviewers: rven, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39087
      672dda9b
  17. 22 5月, 2015 2 次提交
  18. 20 5月, 2015 2 次提交
  19. 19 5月, 2015 1 次提交
  20. 25 4月, 2015 1 次提交
    • A
      Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible · 794ccfde
      Aashish Pant 提交于
      Summary:
      When new capacity is larger than existing capacity, simply update the capacity to the new valie
      When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
      When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity
      
      Test Plan: Created unit tests in cache_test.cc
      
      Reviewers: sdong, rven, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D37527
      794ccfde
  21. 07 4月, 2015 1 次提交
    • S
      A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge · 953a885e
      sdong 提交于
      Summary:
      Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.
      
      Also refactor the codes so that
      (1) make table property collector and internal table property collector two separate data structures with the later one now exposed
      (2) table builders only receive internal table properties
      
      Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.
      
      Reviewers: yhchiang, igor.sugak, rven, igor
      
      Reviewed By: rven, igor
      
      Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D35373
      953a885e
  22. 31 3月, 2015 2 次提交
    • S
      Universal Compactions with Small Files · b23bbaa8
      sdong 提交于
      Summary:
      With this change, we use L1 and up to store compaction outputs in universal compaction.
      The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.
      
      If options.num_levels=1, it behaves all the same as now.
      
      Test Plan:
      1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
      2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
      3) add unit test to migrate from multiple level setting back to one level setting
      4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.
      
      Reviewers: rven, kradhakrishnan, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: meyering, leveldb, MarkCallaghan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D34539
      b23bbaa8
    • I
      db_bench can now disable flashcache for background threads · d61cb0b9
      Igor Canadi 提交于
      Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code.
      
      Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p
      
      Reviewers: MarkCallaghan, sdong
      
      Reviewed By: sdong
      
      Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D35391
      d61cb0b9
  23. 26 3月, 2015 1 次提交
  24. 25 3月, 2015 1 次提交
  25. 20 3月, 2015 1 次提交
    • I
      Don't delete files when column family is dropped · b088c83e
      Igor Canadi 提交于
      Summary:
      To understand the bug read t5943287 and check out the new test in column_family_test (ReadDroppedColumnFamily), iter 0.
      
      RocksDB contract allowes you to read a drop column family as long as there is a live reference. However, since our iteration ignores dropped column families, AddLiveFiles() didn't mark files of a dropped column families as live. So we deleted them.
      
      In this patch I no longer ignore dropped column families in the iteration. I think this behavior was confusing and it also led to this bug. Now if an iterator client wants to ignore dropped column families, he needs to do it explicitly.
      
      Test Plan: Added a new unit test that is failing on master. Unit test succeeds now.
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D32535
      b088c83e
  26. 18 3月, 2015 2 次提交
    • A
      Create an abstract interface for write batches · 81345b90
      agiardullo 提交于
      Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class.  This makes it easier to write code that is agnostic toward the implementation of the particular write batch.  In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch.
      
      Test Plan: modified existing WriteBatchWithIndex tests to test new functions.  Running all tests.
      
      Reviewers: igor, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D34017
      81345b90
    • I
      Deprecate removeScanCountLimit in NewLRUCache · c88ff4ca
      Igor Canadi 提交于
      Summary: It is no longer used by the implementation, so we should also remove it from the public API.
      
      Test Plan: make check
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D34971
      c88ff4ca
  27. 03 3月, 2015 1 次提交
    • I
      options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size... · db037393
      Igor Canadi 提交于
      options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
      
      Summary:
      When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.
      
      In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.
      
      Test Plan: New unit tests and pass tests suites including valgrind.
      
      Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo
      
      Reviewed By: ikabiljo
      
      Subscribers: yoshinorim, ikabiljo, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D31437
      db037393
  28. 28 2月, 2015 1 次提交
    • I
      Fix a bug in ReadOnlyBackupEngine · b9ff6b05
      Igor Canadi 提交于
      Summary:
      This diff fixes a bug introduced by D28521. Read-only backup engine can delete a backup that is later than the latest -- we never check the condition.
      
      I also added a bunch of logging that will help with debugging cases like this in the future.
      
      See more discussion at t6218248.
      
      Test Plan: Added a unit test that was failing before the change. Also, see new LOG file contents: https://phabricator.fb.com/P19738984
      
      Reviewers: benj, sanketh, sumeet, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D33897
      b9ff6b05
  29. 12 2月, 2015 1 次提交
    • S
      Remember whole key/prefix filtering on/off in SST file · 68af7811
      sdong 提交于
      Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter.
      
      Test Plan: Add a unit test for it
      
      Reviewers: yhchiang, igor, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D32889
      68af7811