1. 26 4月, 2016 1 次提交
  2. 25 3月, 2016 1 次提交
  3. 05 3月, 2016 1 次提交
    • Y
      Fix a bug where flush does not happen when a manual compaction is running · a7d4eb2f
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, when rocksdb tries to run manual compaction to refit data into a level,
      there's a ReFitLevel() process that requires no bg work is currently running.
      When RocksDB plans to ReFitLevel(), it will do the following:
      
       1. pause scheduling new bg work.
       2. wait until all bg work finished
       3. do the ReFitLevel()
       4. unpause scheduling new bg work.
      
      However, as it pause scheduling new bg work at step one and waiting for all bg work
      finished in step 2, RocksDB will stop flushing until all bg work is done (which
      could take a long time.)
      
      This patch fix this issue by changing the way ReFitLevel() pause the background work:
      
      1. pause scheduling compaction.
      2. wait until all bg work finished.
      3. pause scheduling flush
      4. do ReFitLevel()
      5. unpause both flush and compaction.
      
      The major difference is that.  We only pause scheduling compaction in step 1 and wait
      for all bg work finished in step 2.  This prevent flush being blocked for a long time.
      Although there's a very rare case that ReFitLevel() might be in starvation in step 2,
      but it's less likely the case as flush typically finish very fast.
      
      Test Plan: existing test.
      
      Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55029
      a7d4eb2f
  4. 04 3月, 2016 1 次提交
  5. 10 2月, 2016 2 次提交
  6. 03 2月, 2016 1 次提交
    • A
      Eliminate duplicated property constants · 284aa613
      Andrew Kryczka 提交于
      Summary:
      Before this diff, there were duplicated constants to refer to properties (user-
      facing API had strings and InternalStats had an enum). I noticed these were
      inconsistent in terms of which constants are provided, names of constants, and
      documentation of constants. Overall it seemed annoying/error-prone to maintain
      these duplicated constants.
      
      So, this diff gets rid of InternalStats's constants and replaces them with a map
      keyed on the user-facing constant. The value in that map contains a function
      pointer to get the property value, so we don't need to do string matching while
      holding db->mutex_. This approach has a side benefit of making many small
      handler functions rather than a giant switch-statement.
      
      Test Plan: db_properties_test passes, running "make commit-prereq -j32"
      
      Reviewers: sdong, yhchiang, kradhakrishnan, IslamAbdelRahman, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D53253
      284aa613
  7. 30 1月, 2016 1 次提交
    • V
      Add options.base_background_compactions as a number of compaction threads for low compaction debt · 3b2a1ddd
      Venkatesh Radhakrishnan 提交于
      Summary:
      If options.base_background_compactions is given, we try to schedule number of compactions not existing this number, only when L0 files increase to certain number, or pending compaction bytes more than certain threshold, we schedule compactions based on options.max_background_compactions.
      
      The watermarks are calculated based on slowdown thresholds.
      
      Test Plan:
      Add new test cases in column_family_test.
      Adding more unit tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D53409
      3b2a1ddd
  8. 30 12月, 2015 1 次提交
  9. 24 12月, 2015 1 次提交
    • S
      When slowdown is triggered, reduce the write rate · b9f77ba1
      sdong 提交于
      Summary: It's usually hard for users to set a value of options.delayed_write_rate. With this diff, after slowdown condition triggers, we greedily reduce write rate if estimated pending compaction bytes increase. If estimated compaction pending bytes drop, we increase the write rate.
      
      Test Plan:
      Add a unit test
      Test with db_bench setting:
      TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom -num=10000000 --soft_pending_compaction_bytes_limit=1000000000 --hard_pending_compaction_bytes_limit=3000000000 --delayed_write_rate=100000000
      
      and make sure without the commit, write stop will happen, but with the commit, it will not happen.
      
      Reviewers: igor, anthony, rven, yhchiang, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52131
      b9f77ba1
  10. 18 12月, 2015 1 次提交
  11. 15 12月, 2015 1 次提交
    • V
      Running manual compactions in parallel with other automatic or manual... · 030215bf
      Venkatesh Radhakrishnan 提交于
      Running manual compactions in parallel with other automatic or manual compactions in restricted cases
      
      Summary:
      This diff provides a framework for doing manual
      compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
      BackgroundCompactions, so that RunManualCompactions can be reentrant.
      Parallelism is controlled by the two routines
      ConflictingManualCompaction to allow/disallow new parallel/manual
      compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
      I will be adding more tests later.
      
      Test Plan: Rocksdb regression + new tests + valgrind
      
      Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D47973
      030215bf
  12. 12 12月, 2015 1 次提交
    • A
      Use SST files for Transaction conflict detection · 3bfd3d39
      agiardullo 提交于
      Summary:
      Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.
      
      With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).
      
      Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.
      
      Test Plan: unit tests, db bench
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D50475
      3bfd3d39
  13. 09 12月, 2015 3 次提交
  14. 08 12月, 2015 3 次提交
    • A
      Support marking snapshots for write-conflict checking · ec704aaf
      agiardullo 提交于
      Summary:
      D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.
      
      This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.
      
      This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).
      
      Test Plan: no behavior change, ran existing tests
      
      Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51183
      ec704aaf
    • S
      Revert "Fix a race condition in persisting options" · f307036b
      sdong 提交于
      This reverts commit 2fa3ed51. It breaks RocksDB lite build
      f307036b
    • Y
      Fix a race condition in persisting options · 2fa3ed51
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch fix a race condition in persisting options which will cause a crash when:
      
      * Thread A obtain cf options and start to persist options based on that cf options.
      * Thread B kicks in and finish DropColumnFamily and delete cf_handle.
      * Thread A wakes up and tries to finish the persisting options and crashes.
      
      Test Plan: Add a test in column_family_test that can reproduce the crash
      
      Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51609
      2fa3ed51
  15. 04 12月, 2015 1 次提交
    • A
      added public api to schedule flush/compaction, code to prevent race with db::open · e8180f99
      Alex Yang 提交于
      Summary:
      Fixes T8781168.
      
      Added a new function EnableAutoCompactions in db.h to be publicly
      avialable.  This allows compaction to be re-enabled after disabling it via
      SetOptions
      
      Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
      Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
      prevent race condition.
      
      Test Plan:
      Ran make all check
      
      verified fix on myrocks side:
      was able to reproduce the seg fault with
      ../tools/mysqltest.sh --mem --force rocksdb.drop_table
      
      method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
      assigned in transaction_db_impl.cc:
        DB::Open(db_options, dbname, column_families_copy, handles, &db);
        clock_t goal = (60000 * 10) + clock();
        while (goal > clock());
        ...dbptr(aka rdb) gets assigned below
      
      verified my changes fixed the issue.
      
      Also added unit test 'ToggleAutoCompaction' in transaction_test.cc
      
      Reviewers: hermanlee4, anthony
      
      Reviewed By: anthony
      
      Subscribers: alex, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51147
      e8180f99
  16. 02 12月, 2015 1 次提交
    • S
      DBTest.DynamicCompactionOptions: More deterministic and readable · f9103d9a
      sdong 提交于
      Summary: DBTest.DynamicCompactionOptions sometimes fails the assert but I can't repro it locally. Make it more deterministic and readable and see whether the problem is still there.
      
      Test Plan: Run tht test and make sure it passes
      
      Reviewers: kradhakrishnan, yhchiang, igor, rven, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51309
      f9103d9a
  17. 11 11月, 2015 1 次提交
    • Y
      Enable RocksDB to persist Options file. · e114f0ab
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch allows rocksdb to persist options into a file on
      DB::Open, SetOptions, and Create / Drop ColumnFamily.
      Options files are created under the same directory as the rocksdb
      instance.
      
      In addition, this patch also adds a fail_if_missing_options_file in DBOptions
      that makes any function call return non-ok status when it is not able to
      persist options properly.
      
        // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
        // / SetOptions will fail if options file is not detected or properly
        // persisted.
        //
        // DEFAULT: false
        bool fail_if_missing_options_file;
      
      Options file names are formatted as OPTIONS-<number>, and RocksDB
      will always keep the latest two options files.
      
      Test Plan:
      Add options_file_test.
      
      options_test
      column_family_test
      
      Reviewers: igor, IslamAbdelRahman, sdong, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48285
      e114f0ab
  18. 04 11月, 2015 2 次提交
    • Y
      Add Memory Insight support to utilities · 7d7ee2b6
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch introduces utilities/memory, which currently includes
      GetApproximateMemoryUsageByType that reports different types of
      rocksdb memory usage given a list of input DBs.
      
      The API also take care of the case where Cache could be shared
      across multiple column families / multiple db instances.
      
      Currently, it reports memory usage of memtable, table-readers
      and cache.
      
      Test Plan: utilities/memory/memory_test.cc
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D49257
      7d7ee2b6
    • Y
      Add GetAggregatedIntProperty(): returns the aggregated value from all CFs · 3ecbab00
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch adds GetAggregatedIntProperty() that returns the aggregated
      value from all CFs
      
      Test Plan: Added a test in db_test
      
      Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: rven, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D49497
      3ecbab00
  19. 20 10月, 2015 1 次提交
  20. 19 10月, 2015 1 次提交
    • S
      db_impl: recycle log files · 66637615
      Sage Weil 提交于
      If log recycling is enabled, put old WAL files on a recycle queue instead of
      deleting them.  When we need a new log file, take a recycled file off the
      list if one is available.
      Signed-off-by: NSage Weil <sage@redhat.com>
      66637615
  21. 18 10月, 2015 1 次提交
  22. 17 10月, 2015 1 次提交
  23. 14 10月, 2015 1 次提交
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  24. 13 10月, 2015 2 次提交
  25. 10 10月, 2015 1 次提交
    • A
      Passing table properties to compaction callback · 3d07b815
      Alexey Maykov 提交于
      Summary: It would be nice to have and access to table properties in compaction callbacks. In MyRocks project, it will make possible to update optimizer statistics online.
      
      Test Plan: ran the unit test. Ran myrocks with the new way of collecting stats.
      
      Reviewers: igor, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48267
      3d07b815
  26. 09 10月, 2015 1 次提交
    • I
      compaction_filter.h cleanup · 9803e0d8
      Igor Canadi 提交于
      Summary:
      Two changes:
      1. remove *V2 filter stuff. we deprecated that a while ago
      2. clarify what happens when user sets max_subcompactions to bigger than 1
      
      Test Plan: none
      
      Reviewers: yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D47871
      9803e0d8
  27. 03 10月, 2015 1 次提交
  28. 24 9月, 2015 1 次提交
    • I
      Add experimental DB::AddFile() to plug sst files into empty DB · f03b5c98
      Islam AbdelRahman 提交于
      Summary:
      This is an initial version of bulk load feature
      
      This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are
      (1) Memtables are empty
      (2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0
      (3) Added sst files values are not overlapping
      
      Test Plan: unit testing
      
      Reviewers: igor, ott, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, ott, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39081
      f03b5c98
  29. 18 9月, 2015 1 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
  30. 03 9月, 2015 1 次提交
    • A
      Unified maps with Comparator for sorting, other cleanup · 3c9cef1e
      Andres Noetzli 提交于
      Summary:
      This diff is a collection of cleanups that were initially part of D43179.
      Additionally it adds a unified way of defining key-value maps that use a
      Comparator for sorting (this was previously implemented in four different
      places).
      
      Test Plan: make clean check all
      
      Reviewers: rven, anthony, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45993
      3c9cef1e
  31. 12 8月, 2015 1 次提交
    • A
      Pessimistic Transactions · c2f2cb02
      agiardullo 提交于
      Summary:
      Initial implementation of Pessimistic Transactions.  This diff contains the api changes discussed in D38913.  This diff is pretty large, so let me know if people would prefer to meet up to discuss it.
      
      MyRocks folks:  please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.
      
      Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint().  After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex.  We can then decide which route is preferable.
      
      Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.
      
      Test Plan: Unit tests, db_bench parallel testing.
      
      Reviewers: igor, rven, sdong, yhchiang, yoshinorim
      
      Reviewed By: sdong
      
      Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40869
      c2f2cb02
  32. 07 8月, 2015 2 次提交
    • A
      simple ManagedSnapshot wrapper · 16ea1c7d
      agiardullo 提交于
      Summary: Implemented this simple wrapper for something else I was working on.  Seemed like it makes sense to expose it instead of burying it in some random code.
      
      Test Plan: added test
      
      Reviewers: rven, kradhakrishnan, sdong, yhchiang
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43293
      16ea1c7d
    • S
      Avoid type unique_ptr in LogWriterNumber::writer for Windows build break · 6a4aaadc
      sdong 提交于
      Summary:
      Visual Studio complains about deque<LogWriterNumber> because LogWriterNumber is non-copyable for its unique_ptr member writer. Move away from it, and do explit free.
      It is less safe but I can't think of a better way to unblock it.
      
      Test Plan: valgrind check test
      
      Reviewers: anthony, IslamAbdelRahman, kolmike, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43647
      6a4aaadc