1. 25 6月, 2016 1 次提交
    • A
      Refactor to use VersionSet [CF + RepairDB part 1/3] · 343507af
      Andrew Kryczka 提交于
      Summary:
      To support column families, it is easiest to use VersionSet to manage
      our column families (if we don't have Versions then ColumnFamilyData always
      behaves as a dummy column family). This diff only refactors the existing repair
      logic to use VersionSet; the next two parts will add support for multiple
      column families.
      
      Test Plan:
        $ ./repair_test
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D59775
      343507af
  2. 22 6月, 2016 2 次提交
  3. 10 6月, 2016 1 次提交
    • W
      Backup Options · 56887f6c
      Wanning Jiang 提交于
      Summary: Backup options file to private directory
      
      Test Plan:
      backupable_db_test.cc, BackupOptions
      	   Modify DB options by calling OpenDB for 3 times. Check the latest options file is in the right place. Also check no redundent files are backuped.
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba, andrewkr
      
      Differential Revision: https://reviews.facebook.net/D59373
      56887f6c
  4. 03 6月, 2016 1 次提交
    • P
      Add a callback for when memtable is moved to immutable (#1137) · 3a276b0c
      PraveenSinghRao 提交于
      * Create a callback for memtable becoming immutable
      
      Create a callback for memtable becoming immutable
      
      Create a callback for memtable becoming immutable
      
      moved notification outside the lock
      
      Move sealed notification to unlocked portion of SwitchMemtable
      
      * fix lite build
      3a276b0c
  5. 18 5月, 2016 2 次提交
    • R
      Long outstanding prepare test · a6254f2b
      Reid Horuff 提交于
      Summary: This tests that a prepared transaction is not lost after several crashes, restarts, and memtable flushes.
      
      Test Plan: TwoPhaseLongPrepareTest
      
      Reviewers: sdong
      
      Subscribers: hermanlee4, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D58185
      a6254f2b
    • A
      [rocksdb] make more options dynamic · 43afd72b
      Aaron Gao 提交于
      Summary:
      make more ColumnFamilyOptions dynamic:
      - compression
      - soft_pending_compaction_bytes_limit
      - hard_pending_compaction_bytes_limit
      - min_partial_merge_operands
      - report_bg_io_stats
      - paranoid_file_checks
      
      Test Plan:
      Add sanity check in `db_test.cc` for all above options except for soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit.
      All passed.
      
      Reviewers: andrewkr, sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57519
      43afd72b
  6. 11 5月, 2016 2 次提交
    • R
      [rocksdb] Two Phase Transaction · 8a66c85e
      Reid Horuff 提交于
      Summary:
      Two Phase Commit addition to RocksDB.
      
      See wiki: https://github.com/facebook/rocksdb/wiki/Two-Phase-Commit-Implementation
      Quip: https://fb.quip.com/pxZrAyrx53r3
      
      Depends on:
      WriteBatch modification: https://reviews.facebook.net/D54093
      Memtable Log Referencing and Prepared Batch Recovery: https://reviews.facebook.net/D56919
      
      Test Plan:
      - SimpleTwoPhaseTransactionTest
      - PersistentTwoPhaseTransactionTest.
      - TwoPhaseRollbackTest
      - TwoPhaseMultiThreadTest
      - TwoPhaseLogRollingTest
      - TwoPhaseEmptyWriteTest
      - TwoPhaseExpirationTest
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, hermanlee4, andrewkr, vasilep, dhruba, santoshb
      
      Differential Revision: https://reviews.facebook.net/D56925
      8a66c85e
    • R
      [rocksdb] Memtable Log Referencing and Prepared Batch Recovery · 1b8a2e8f
      Reid Horuff 提交于
      Summary:
      This diff is built on top of WriteBatch modification: https://reviews.facebook.net/D54093 and adds the required functionality to rocksdb core necessary for rocksdb to support 2PC.
      
      modfication of DBImpl::WriteImpl()
      - added two arguments *uint64_t log_used = nullptr, uint64_t log_ref = 0;
      - *log_used is an output argument which will return the log number which the incoming batch was inserted into, 0 if no WAL insert took place.
      -  log_ref is a supplied log_number which all memtables inserted into will reference after the batch insert takes place. This number will reside in 'FindMinPrepLogReferencedByMemTable()' until all Memtables insertinto have flushed.
      
      - Recovery/writepath is now aware of prepared batches and commit and rollback markers.
      
      Test Plan: There is currently no test on this diff. All testing of this functionality takes place in the Transaction layer/diff but I will add some testing.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Subscribers: leveldb, santoshb, andrewkr, vasilep, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D56919
      1b8a2e8f
  7. 28 4月, 2016 1 次提交
  8. 27 4月, 2016 1 次提交
    • S
      CompactedDB should not be used if there is outstanding WAL files · ac0e54b4
      sdong 提交于
      Summary: CompactedDB skips memtable. So we shouldn't use compacted DB if there is outstanding WAL files.
      
      Test Plan: Change to options.max_open_files = -1 perf context test to create a compacted DB, which we shouldn't do.
      
      Reviewers: yhchiang, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57057
      ac0e54b4
  9. 26 4月, 2016 1 次提交
  10. 25 3月, 2016 1 次提交
  11. 05 3月, 2016 1 次提交
    • Y
      Fix a bug where flush does not happen when a manual compaction is running · a7d4eb2f
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, when rocksdb tries to run manual compaction to refit data into a level,
      there's a ReFitLevel() process that requires no bg work is currently running.
      When RocksDB plans to ReFitLevel(), it will do the following:
      
       1. pause scheduling new bg work.
       2. wait until all bg work finished
       3. do the ReFitLevel()
       4. unpause scheduling new bg work.
      
      However, as it pause scheduling new bg work at step one and waiting for all bg work
      finished in step 2, RocksDB will stop flushing until all bg work is done (which
      could take a long time.)
      
      This patch fix this issue by changing the way ReFitLevel() pause the background work:
      
      1. pause scheduling compaction.
      2. wait until all bg work finished.
      3. pause scheduling flush
      4. do ReFitLevel()
      5. unpause both flush and compaction.
      
      The major difference is that.  We only pause scheduling compaction in step 1 and wait
      for all bg work finished in step 2.  This prevent flush being blocked for a long time.
      Although there's a very rare case that ReFitLevel() might be in starvation in step 2,
      but it's less likely the case as flush typically finish very fast.
      
      Test Plan: existing test.
      
      Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55029
      a7d4eb2f
  12. 04 3月, 2016 1 次提交
  13. 10 2月, 2016 2 次提交
  14. 03 2月, 2016 1 次提交
    • A
      Eliminate duplicated property constants · 284aa613
      Andrew Kryczka 提交于
      Summary:
      Before this diff, there were duplicated constants to refer to properties (user-
      facing API had strings and InternalStats had an enum). I noticed these were
      inconsistent in terms of which constants are provided, names of constants, and
      documentation of constants. Overall it seemed annoying/error-prone to maintain
      these duplicated constants.
      
      So, this diff gets rid of InternalStats's constants and replaces them with a map
      keyed on the user-facing constant. The value in that map contains a function
      pointer to get the property value, so we don't need to do string matching while
      holding db->mutex_. This approach has a side benefit of making many small
      handler functions rather than a giant switch-statement.
      
      Test Plan: db_properties_test passes, running "make commit-prereq -j32"
      
      Reviewers: sdong, yhchiang, kradhakrishnan, IslamAbdelRahman, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D53253
      284aa613
  15. 30 1月, 2016 1 次提交
    • V
      Add options.base_background_compactions as a number of compaction threads for low compaction debt · 3b2a1ddd
      Venkatesh Radhakrishnan 提交于
      Summary:
      If options.base_background_compactions is given, we try to schedule number of compactions not existing this number, only when L0 files increase to certain number, or pending compaction bytes more than certain threshold, we schedule compactions based on options.max_background_compactions.
      
      The watermarks are calculated based on slowdown thresholds.
      
      Test Plan:
      Add new test cases in column_family_test.
      Adding more unit tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D53409
      3b2a1ddd
  16. 30 12月, 2015 1 次提交
  17. 24 12月, 2015 1 次提交
    • S
      When slowdown is triggered, reduce the write rate · b9f77ba1
      sdong 提交于
      Summary: It's usually hard for users to set a value of options.delayed_write_rate. With this diff, after slowdown condition triggers, we greedily reduce write rate if estimated pending compaction bytes increase. If estimated compaction pending bytes drop, we increase the write rate.
      
      Test Plan:
      Add a unit test
      Test with db_bench setting:
      TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom -num=10000000 --soft_pending_compaction_bytes_limit=1000000000 --hard_pending_compaction_bytes_limit=3000000000 --delayed_write_rate=100000000
      
      and make sure without the commit, write stop will happen, but with the commit, it will not happen.
      
      Reviewers: igor, anthony, rven, yhchiang, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52131
      b9f77ba1
  18. 18 12月, 2015 1 次提交
  19. 15 12月, 2015 1 次提交
    • V
      Running manual compactions in parallel with other automatic or manual... · 030215bf
      Venkatesh Radhakrishnan 提交于
      Running manual compactions in parallel with other automatic or manual compactions in restricted cases
      
      Summary:
      This diff provides a framework for doing manual
      compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
      BackgroundCompactions, so that RunManualCompactions can be reentrant.
      Parallelism is controlled by the two routines
      ConflictingManualCompaction to allow/disallow new parallel/manual
      compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
      I will be adding more tests later.
      
      Test Plan: Rocksdb regression + new tests + valgrind
      
      Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D47973
      030215bf
  20. 12 12月, 2015 1 次提交
    • A
      Use SST files for Transaction conflict detection · 3bfd3d39
      agiardullo 提交于
      Summary:
      Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.
      
      With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).
      
      Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.
      
      Test Plan: unit tests, db bench
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D50475
      3bfd3d39
  21. 09 12月, 2015 3 次提交
  22. 08 12月, 2015 3 次提交
    • A
      Support marking snapshots for write-conflict checking · ec704aaf
      agiardullo 提交于
      Summary:
      D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.
      
      This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.
      
      This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).
      
      Test Plan: no behavior change, ran existing tests
      
      Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51183
      ec704aaf
    • S
      Revert "Fix a race condition in persisting options" · f307036b
      sdong 提交于
      This reverts commit 2fa3ed51. It breaks RocksDB lite build
      f307036b
    • Y
      Fix a race condition in persisting options · 2fa3ed51
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch fix a race condition in persisting options which will cause a crash when:
      
      * Thread A obtain cf options and start to persist options based on that cf options.
      * Thread B kicks in and finish DropColumnFamily and delete cf_handle.
      * Thread A wakes up and tries to finish the persisting options and crashes.
      
      Test Plan: Add a test in column_family_test that can reproduce the crash
      
      Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51609
      2fa3ed51
  23. 04 12月, 2015 1 次提交
    • A
      added public api to schedule flush/compaction, code to prevent race with db::open · e8180f99
      Alex Yang 提交于
      Summary:
      Fixes T8781168.
      
      Added a new function EnableAutoCompactions in db.h to be publicly
      avialable.  This allows compaction to be re-enabled after disabling it via
      SetOptions
      
      Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
      Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
      prevent race condition.
      
      Test Plan:
      Ran make all check
      
      verified fix on myrocks side:
      was able to reproduce the seg fault with
      ../tools/mysqltest.sh --mem --force rocksdb.drop_table
      
      method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
      assigned in transaction_db_impl.cc:
        DB::Open(db_options, dbname, column_families_copy, handles, &db);
        clock_t goal = (60000 * 10) + clock();
        while (goal > clock());
        ...dbptr(aka rdb) gets assigned below
      
      verified my changes fixed the issue.
      
      Also added unit test 'ToggleAutoCompaction' in transaction_test.cc
      
      Reviewers: hermanlee4, anthony
      
      Reviewed By: anthony
      
      Subscribers: alex, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51147
      e8180f99
  24. 02 12月, 2015 1 次提交
    • S
      DBTest.DynamicCompactionOptions: More deterministic and readable · f9103d9a
      sdong 提交于
      Summary: DBTest.DynamicCompactionOptions sometimes fails the assert but I can't repro it locally. Make it more deterministic and readable and see whether the problem is still there.
      
      Test Plan: Run tht test and make sure it passes
      
      Reviewers: kradhakrishnan, yhchiang, igor, rven, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51309
      f9103d9a
  25. 11 11月, 2015 1 次提交
    • Y
      Enable RocksDB to persist Options file. · e114f0ab
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch allows rocksdb to persist options into a file on
      DB::Open, SetOptions, and Create / Drop ColumnFamily.
      Options files are created under the same directory as the rocksdb
      instance.
      
      In addition, this patch also adds a fail_if_missing_options_file in DBOptions
      that makes any function call return non-ok status when it is not able to
      persist options properly.
      
        // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
        // / SetOptions will fail if options file is not detected or properly
        // persisted.
        //
        // DEFAULT: false
        bool fail_if_missing_options_file;
      
      Options file names are formatted as OPTIONS-<number>, and RocksDB
      will always keep the latest two options files.
      
      Test Plan:
      Add options_file_test.
      
      options_test
      column_family_test
      
      Reviewers: igor, IslamAbdelRahman, sdong, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48285
      e114f0ab
  26. 04 11月, 2015 2 次提交
    • Y
      Add Memory Insight support to utilities · 7d7ee2b6
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch introduces utilities/memory, which currently includes
      GetApproximateMemoryUsageByType that reports different types of
      rocksdb memory usage given a list of input DBs.
      
      The API also take care of the case where Cache could be shared
      across multiple column families / multiple db instances.
      
      Currently, it reports memory usage of memtable, table-readers
      and cache.
      
      Test Plan: utilities/memory/memory_test.cc
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D49257
      7d7ee2b6
    • Y
      Add GetAggregatedIntProperty(): returns the aggregated value from all CFs · 3ecbab00
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch adds GetAggregatedIntProperty() that returns the aggregated
      value from all CFs
      
      Test Plan: Added a test in db_test
      
      Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: rven, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D49497
      3ecbab00
  27. 20 10月, 2015 1 次提交
  28. 19 10月, 2015 1 次提交
    • S
      db_impl: recycle log files · 66637615
      Sage Weil 提交于
      If log recycling is enabled, put old WAL files on a recycle queue instead of
      deleting them.  When we need a new log file, take a recycled file off the
      list if one is available.
      Signed-off-by: NSage Weil <sage@redhat.com>
      66637615
  29. 18 10月, 2015 1 次提交
  30. 17 10月, 2015 1 次提交
  31. 14 10月, 2015 1 次提交
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b