1. 12 12月, 2015 1 次提交
    • A
      Use SST files for Transaction conflict detection · 3bfd3d39
      agiardullo 提交于
      Summary:
      Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.
      
      With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).
      
      Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.
      
      Test Plan: unit tests, db bench
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D50475
      3bfd3d39
  2. 11 12月, 2015 1 次提交
    • A
      Change SingleDelete to support conflict checking · 9e446290
      agiardullo 提交于
      Summary: For Transactions, we want to start using the SST files to do write conflict checking.  To do this, we need to make sure that compaction never removes all writes if an earlier snapshot exists.  So I had to change the way we process SingleDeletes to sometimes leave a SingleDelete behind when we encounter a Put followed by a SingleDelete.  See the comments in this diff for a more detailed explanation.
      
      Test Plan: added more unit tests
      
      Reviewers: rven, igor, kradhakrishnan, IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D50295
      9e446290
  3. 09 12月, 2015 3 次提交
  4. 08 12月, 2015 3 次提交
    • A
      Support marking snapshots for write-conflict checking · ec704aaf
      agiardullo 提交于
      Summary:
      D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.
      
      This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.
      
      This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).
      
      Test Plan: no behavior change, ran existing tests
      
      Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51183
      ec704aaf
    • S
      Revert "Fix a race condition in persisting options" · f307036b
      sdong 提交于
      This reverts commit 2fa3ed51. It breaks RocksDB lite build
      f307036b
    • Y
      Fix a race condition in persisting options · 2fa3ed51
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch fix a race condition in persisting options which will cause a crash when:
      
      * Thread A obtain cf options and start to persist options based on that cf options.
      * Thread B kicks in and finish DropColumnFamily and delete cf_handle.
      * Thread A wakes up and tries to finish the persisting options and crashes.
      
      Test Plan: Add a test in column_family_test that can reproduce the crash
      
      Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51609
      2fa3ed51
  5. 04 12月, 2015 1 次提交
    • A
      added public api to schedule flush/compaction, code to prevent race with db::open · e8180f99
      Alex Yang 提交于
      Summary:
      Fixes T8781168.
      
      Added a new function EnableAutoCompactions in db.h to be publicly
      avialable.  This allows compaction to be re-enabled after disabling it via
      SetOptions
      
      Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
      Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
      prevent race condition.
      
      Test Plan:
      Ran make all check
      
      verified fix on myrocks side:
      was able to reproduce the seg fault with
      ../tools/mysqltest.sh --mem --force rocksdb.drop_table
      
      method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
      assigned in transaction_db_impl.cc:
        DB::Open(db_options, dbname, column_families_copy, handles, &db);
        clock_t goal = (60000 * 10) + clock();
        while (goal > clock());
        ...dbptr(aka rdb) gets assigned below
      
      verified my changes fixed the issue.
      
      Also added unit test 'ToggleAutoCompaction' in transaction_test.cc
      
      Reviewers: hermanlee4, anthony
      
      Reviewed By: anthony
      
      Subscribers: alex, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51147
      e8180f99
  6. 01 12月, 2015 1 次提交
    • S
      DB to only flush the column family with the largest memtable while... · db320b1b
      sdong 提交于
      DB to only flush the column family with the largest memtable while option.db_write_buffer_size is hit
      
      Summary: When option.db_write_buffer_size is hit, we currently flush all column families. Move to flush the column family with the largest active memt table instead. In this way, we can avoid too many small files in some cases.
      
      Test Plan: Modify test DBTest.SharedWriteBuffer to work with the updated behavior
      
      Reviewers: kradhakrishnan, yhchiang, rven, anthony, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: march, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51291
      db320b1b
  7. 17 11月, 2015 1 次提交
  8. 14 11月, 2015 1 次提交
  9. 13 11月, 2015 1 次提交
    • N
      Don't merge WriteBatch-es if WAL is disabled · 6ce42dd0
      Nathan Bronson 提交于
      Summary:
      There's no need for WriteImpl to flatten the write batch group
      into a single WriteBatch if the WAL is disabled.  This diff moves the
      flattening into the WAL step, and skips flattening entirely if it isn't
      needed.  It's good for about 5% speedup on a multi-threaded workload
      with no WAL.
      
      This diff also adds clarifying comments about the chance for partial
      failure of WriteBatchInternal::InsertInto, and always sets bg_error_ if
      the memtable state diverges from the logged state or if a WriteBatch
      succeeds only partially.
      
      Benchmark for speedup:
        db_bench -benchmarks=fillrandom -threads=16 -batch_size=1 -memtablerep=skip_list -value_size=0 --num=200000 -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      
      Test Plan: asserts + make check
      
      Reviewers: sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D50583
      6ce42dd0
  10. 11 11月, 2015 1 次提交
    • Y
      Enable RocksDB to persist Options file. · e114f0ab
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch allows rocksdb to persist options into a file on
      DB::Open, SetOptions, and Create / Drop ColumnFamily.
      Options files are created under the same directory as the rocksdb
      instance.
      
      In addition, this patch also adds a fail_if_missing_options_file in DBOptions
      that makes any function call return non-ok status when it is not able to
      persist options properly.
      
        // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
        // / SetOptions will fail if options file is not detected or properly
        // persisted.
        //
        // DEFAULT: false
        bool fail_if_missing_options_file;
      
      Options file names are formatted as OPTIONS-<number>, and RocksDB
      will always keep the latest two options files.
      
      Test Plan:
      Add options_file_test.
      
      options_test
      column_family_test
      
      Reviewers: igor, IslamAbdelRahman, sdong, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48285
      e114f0ab
  11. 06 11月, 2015 1 次提交
    • V
      Prefix-based iterating only shows keys in prefix · 9d50afc3
      Venkatesh Radhakrishnan 提交于
      Summary:
      MyRocks testing found an issue that while iterating over keys
      that are outside the prefix, sometimes wrong results were seen for keys
      outside the prefix. We now tighten the range of keys seen with a new
      read option called prefix_seen_at_start. This remembers the starting
      prefix and then compares it on a Next for equality of prefix. If they
      are from a different prefix, it sets valid to false.
      
      Test Plan: PrefixTest.PrefixValid
      
      Reviewers: IslamAbdelRahman, sdong, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: spetrunia, hermanlee4, yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D50211
      9d50afc3
  12. 04 11月, 2015 1 次提交
  13. 31 10月, 2015 1 次提交
    • I
      Update DB::AddFile() to have less restrictions · ff4499e2
      Islam AbdelRahman 提交于
      Summary:
      Update DB::AddFile() restrictions to be
        - Key range in loaded table file don't overlap with existing keys or tombstones in DB.
        - No other writes happen during AddFile call.
      
      The updated AddFile() will verify that the file key range don't overlap with any keys or tombstones in the DB, and then add the file to L0
      
      Test Plan: unit tests
      
      Reviewers: igor, rven, anthony, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: adsharma, ameyag, dhruba
      
      Differential Revision: https://reviews.facebook.net/D49233
      ff4499e2
  14. 30 10月, 2015 2 次提交
    • I
      Clean and expose CreateLoggerFromOptions · 2872e0c8
      Islam AbdelRahman 提交于
      Summary:
      CreateLoggerFromOptions have some parameters like  db_log_dir and env, these parameters are redundant since they already exist in DBOptions
      
      this patch remove the redundant parameters and expose CreateLoggerFromOptions to users
      
      Test Plan: make check
      
      Reviewers: igor, anthony, yhchiang, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D49713
      2872e0c8
    • S
      "make format" in some recent commits · 296c3a1f
      sdong 提交于
      Summary: Run "make format" for some recent commits.
      
      Test Plan: Build and run tests
      
      Reviewers: IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D49707
      296c3a1f
  15. 29 10月, 2015 1 次提交
  16. 27 10月, 2015 1 次提交
  17. 20 10月, 2015 2 次提交
  18. 19 10月, 2015 5 次提交
  19. 18 10月, 2015 1 次提交
  20. 17 10月, 2015 3 次提交
  21. 14 10月, 2015 3 次提交
    • I
      Make db_test_util compile under ROCKSDB_LITE · f55d3009
      Islam AbdelRahman 提交于
      Summary: db_test_util is used in multiple test files but it dont compile under ROCKSDB_LITE
      
      Test Plan:
      make check
      make static_lib
      OPT=-DROCKSDB_LITE make db_wal_test
      
      Reviewers: igor, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48579
      f55d3009
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
    • P
      Put wal_filter under #ifndef ROCKSDB_LITE · cc4d13e0
      Praveen Rao 提交于
      cc4d13e0
  22. 13 10月, 2015 2 次提交
  23. 10 10月, 2015 2 次提交
    • A
      Passing table properties to compaction callback · 3d07b815
      Alexey Maykov 提交于
      Summary: It would be nice to have and access to table properties in compaction callbacks. In MyRocks project, it will make possible to update optimizer statistics online.
      
      Test Plan: ran the unit test. Ran myrocks with the new way of collecting stats.
      
      Reviewers: igor, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48267
      3d07b815
    • S
      Pass column family ID to table property collector · 776bd8d5
      sdong 提交于
      Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.
      
      Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.
      
      Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48411
      776bd8d5
  24. 07 10月, 2015 1 次提交
    • D
      Support for LevelDB SST with .ldb suffix · 02675026
      dyniusz 提交于
      Summary:
      	Handle SST files with both ".sst" and ".ldb" suffix.
      	This enables user to migrate from leveldb to rocksdb.
      
      Test Plan:
              Added unit test with DB operating on SSTs with names schema.
              See db/dc_test.cc:SSTsWithLdbSuffixHandling for details
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D48003
      02675026