1. 12 12月, 2015 1 次提交
    • A
      Use SST files for Transaction conflict detection · 3bfd3d39
      agiardullo 提交于
      Summary:
      Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.
      
      With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).
      
      Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.
      
      Test Plan: unit tests, db bench
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D50475
      3bfd3d39
  2. 11 12月, 2015 3 次提交
    • Y
      Fixed the valgrind error in ColumnFamilyTest::CreateAndDropRace · f0a8e5a2
      Yueh-Hsuan Chiang 提交于
      Summary: Fixed the valgrind error in ColumnFamilyTest::CreateAndDropRace
      
      Test Plan: valgrind --error-exitcode=2 --leak-check=full ./column_family_test
      
      Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51795
      f0a8e5a2
    • A
      Change SingleDelete to support conflict checking · 9e446290
      agiardullo 提交于
      Summary: For Transactions, we want to start using the SST files to do write conflict checking.  To do this, we need to make sure that compaction never removes all writes if an earlier snapshot exists.  So I had to change the way we process SingleDeletes to sometimes leave a SingleDelete behind when we encounter a Put followed by a SingleDelete.  See the comments in this diff for a more detailed explanation.
      
      Test Plan: added more unit tests
      
      Reviewers: rven, igor, kradhakrishnan, IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D50295
      9e446290
    • C
      fix typos in comments · c30b4995
      charsyam 提交于
      c30b4995
  3. 10 12月, 2015 2 次提交
    • S
      Deprecate options.soft_rate_limit and add options.soft_pending_compaction_bytes_limit · 56e77f09
      sdong 提交于
      Summary: Deprecate options.soft_rate_limit, which is hard to tune, with options.soft_pending_compaction_bytes_limit, which would trigger the slowdown if estimated pending compaction bytes exceeds the threshold. The hope is to make it more striaght-forward to tune.
      
      Test Plan: Modify DBTest.SoftLimit to cover options.soft_pending_compaction_bytes_limit instead; run all unit tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, igor, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51117
      56e77f09
    • S
      A new compaction picking priority that optimizes for write amplification for random updates. · d6e1035a
      sdong 提交于
      Summary: Introduce a compaction picking priority that picks files who contains the oldest rows to compact. This is a mode that slightly improves write amplification for random update cases.
      
      Test Plan: Add a unit test and run it in valgrind too.
      
      Reviewers: yhchiang, anthony, IslamAbdelRahman, rven, kradhakrishnan, MarkCallaghan, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51459
      d6e1035a
  4. 09 12月, 2015 5 次提交
  5. 08 12月, 2015 8 次提交
    • A
      Support marking snapshots for write-conflict checking · ec704aaf
      agiardullo 提交于
      Summary:
      D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.
      
      This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.
      
      This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).
      
      Test Plan: no behavior change, ran existing tests
      
      Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51183
      ec704aaf
    • S
      Fix occasional failure of DBTest.DynamicCompactionOptions · 770dea93
      sdong 提交于
      Summary: DBTest.DynamicCompactionOptions ocasionally fails during valgrind run. We sent a sleeping task to block compaction thread pool but we don't wait it to run.
      
      Test Plan: Run the test multiple times in an environment which can cause failure.
      
      Reviewers: rven, kradhakrishnan, igor, IslamAbdelRahman, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51687
      770dea93
    • S
      Split histogram per OperationType in db_bench · ebc2d490
      SherlockNoMad 提交于
      ebc2d490
    • S
      Revert "Fix a race condition in persisting options" · f307036b
      sdong 提交于
      This reverts commit 2fa3ed51. It breaks RocksDB lite build
      f307036b
    • Y
      Fix a race condition in persisting options · 2fa3ed51
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch fix a race condition in persisting options which will cause a crash when:
      
      * Thread A obtain cf options and start to persist options based on that cf options.
      * Thread B kicks in and finish DropColumnFamily and delete cf_handle.
      * Thread A wakes up and tries to finish the persisting options and crashes.
      
      Test Plan: Add a test in column_family_test that can reproduce the crash
      
      Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51609
      2fa3ed51
    • V
      Fix valgrind failures in 3 tests in db_compaction_test due to new skiplist changes · f276c3a8
      Venkatesh Radhakrishnan 提交于
      Summary:
      Several tests in db_compaction_test are failing with aborts in
      valgrind. These are LevelCompactionThirdPath, LevelCompactionPathUse and
      CompressLevelCompaction. We now use the SpecialSkipListFactory to make
      them more deterministic
      
      Test Plan: valgrind
      
      Reviewers: anthony, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51663
      f276c3a8
    • S
      Fix undeterministic failure of ColumnFamilyTest.DifferentWriteBufferSizes · 291088ae
      sdong 提交于
      Summary: After the skip list optimization, ColumnFamilyTest.DifferentWriteBufferSizes can occasionally fail with flush triggering of column family 3. Insert more data to it to make sure flush will trigger.
      
      Test Plan: Run it multiple times with both of jemaloc on and off and see it always passes. (Without thd commit the run with jemalloc fails with chance of about one in two)
      
      Reviewers: rven, yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51645
      291088ae
    • S
      EstimatedNumKeys Counter Inaccurate · 355fa943
      SherlockNoMad 提交于
      355fa943
  6. 05 12月, 2015 2 次提交
    • I
      Fix db_universal_compaction_test · a9ca9107
      Islam AbdelRahman 提交于
      Summary:
      db_universal_compaction_test is still failing because of
      UniversalCompactionNumLevels/DBTestUniversalCompaction.UniversalCompactionSecondPathRatio/0
      
      https://travis-ci.org/facebook/rocksdb/jobs/94949919
      
      Use same approach to fix other tests to fix this test
      
      Test Plan: Run ./db_universal_compaction_test on mac and make sure all the tests pass
      
      Reviewers: kradhakrishnan, yhchiang, rven, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D51591
      a9ca9107
    • K
      Build break fix. · d3bb572d
      krad 提交于
      Summary: Skip list now cannot estimate memory across allocators
      consistently and hence triggers flush at different time. This breaks certain
      unit tests.
      
      The fix is to adopt key count instead of size for flush.
      
      Test Plan: Ran test on dev box and mac (where it used to fail)
      
      Reviewers: sdong
      
      CC: leveldb@
      
      Task ID: #9273334
      
      Blame Rev:
      d3bb572d
  7. 04 12月, 2015 1 次提交
    • A
      added public api to schedule flush/compaction, code to prevent race with db::open · e8180f99
      Alex Yang 提交于
      Summary:
      Fixes T8781168.
      
      Added a new function EnableAutoCompactions in db.h to be publicly
      avialable.  This allows compaction to be re-enabled after disabling it via
      SetOptions
      
      Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
      Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
      prevent race condition.
      
      Test Plan:
      Ran make all check
      
      verified fix on myrocks side:
      was able to reproduce the seg fault with
      ../tools/mysqltest.sh --mem --force rocksdb.drop_table
      
      method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
      assigned in transaction_db_impl.cc:
        DB::Open(db_options, dbname, column_families_copy, handles, &db);
        clock_t goal = (60000 * 10) + clock();
        while (goal > clock());
        ...dbptr(aka rdb) gets assigned below
      
      verified my changes fixed the issue.
      
      Also added unit test 'ToggleAutoCompaction' in transaction_test.cc
      
      Reviewers: hermanlee4, anthony
      
      Reviewed By: anthony
      
      Subscribers: alex, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51147
      e8180f99
  8. 03 12月, 2015 3 次提交
  9. 02 12月, 2015 3 次提交
    • S
      Relax verification condition of DBTest.SuggestCompactRangeTest · bcd7bd12
      sdong 提交于
      Summary: Verifiction condition of DBTest.SuggestCompactRangeTest is too strict. Based on key distribution, we might have more small files in last level. Not check number of files in the last level.
      
      Test Plan: Run DBTest.SuggestCompactRangeTest with both of jemalloc on and off.
      
      Reviewers: rven, IslamAbdelRahman, yhchiang, kradhakrishnan, igor, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51501
      bcd7bd12
    • S
      DBTest.DynamicCompactionOptions: More deterministic and readable · f9103d9a
      sdong 提交于
      Summary: DBTest.DynamicCompactionOptions sometimes fails the assert but I can't repro it locally. Make it more deterministic and readable and see whether the problem is still there.
      
      Test Plan: Run tht test and make sure it passes
      
      Reviewers: kradhakrishnan, yhchiang, igor, rven, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51309
      f9103d9a
    • S
      Fix DBCompactionTestWithParam.CompactionTrigger in non-jemalloc build. · 0ad68518
      sdong 提交于
      Summary: DBCompactionTestWithParam.CompactionTrigger fails in non-jemalloc build, after the skip list memtable change. Fix it by making mem table flush trigger by number of entries.
      
      Test Plan: Run the test using both of jemalloc and non-jemalloc build.
      
      Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, igor, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51471
      0ad68518
  10. 01 12月, 2015 4 次提交
    • S
      Revert previous behavior of internal_key_skipped_count · 459c7fba
      sdong 提交于
      Summary: With recent commit 33e0c938, db iterator skips perf context counter internal_key_skipped_count when blindly issuing internal Next(). Now increment the counter by one when issuing this Next()
      
      Test Plan: Run all existing tests
      
      Reviewers: rven, yhchiang, IslamAbdelRahman, kradhakrishnan, igor, anthony
      
      Reviewed By: anthony
      
      Subscribers: yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51465
      459c7fba
    • A
      Fix CLANG build · 481f9edb
      agiardullo 提交于
      Summary: fix clang build
      
      Test Plan: build
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51453
      481f9edb
    • S
      Fix DBTest.SuggestCompactRangeTest for disable jemalloc case · ef8ed368
      sdong 提交于
      Summary: DBTest.SuggestCompactRangeTest fails for the case when jemalloc is disabled, including ASAN and valgrind builds. It is caused by the improvement of skip list, which allocates different size of nodes for a new records. Fix it by using a special mem table that triggers a flush by number of entries. In that way the behavior will be consistent for all allocators.
      
      Test Plan: Run the test with both of DISABLE_JEMALLOC=1 and 0
      
      Reviewers: anthony, rven, yhchiang, kradhakrishnan, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51423
      ef8ed368
    • S
      DB to only flush the column family with the largest memtable while... · db320b1b
      sdong 提交于
      DB to only flush the column family with the largest memtable while option.db_write_buffer_size is hit
      
      Summary: When option.db_write_buffer_size is hit, we currently flush all column families. Move to flush the column family with the largest active memt table instead. In this way, we can avoid too many small files in some cases.
      
      Test Plan: Modify test DBTest.SharedWriteBuffer to work with the updated behavior
      
      Reviewers: kradhakrishnan, yhchiang, rven, anthony, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: march, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51291
      db320b1b
  11. 25 11月, 2015 4 次提交
    • S
      Reduce extra key comparision in DBIter::Next() · 33e0c938
      sdong 提交于
      Summary: Now DBIter::Next() always compares with current key with itself first, which is unnecessary if the last key is not a merge key. I made the change and didn't see db_iter_test fails. Want to hear whether people have any idea what I miss.
      
      Test Plan: Run all unit tests
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48279
      33e0c938
    • N
      InlineSkipList part 3/3 - new skiplist type that colocates key and node · 9a9d4759
      Nathan Bronson 提交于
      Summary:
      This diff completes the creation of InlineSkipList<Cmp>, which is like
      SkipList<const char*, Cmp> but it always allocates the key contiguously
      with the node.  This allows us to remove the pointer from the node
      to the key.  As a result the memory usage of the skip list is reduced
      (by 1 to sizeof(void*) bytes depending on the padding required to align
      the key storage), cache locality is improved, and we halve the number
      of calls to the allocator.
      
      For skip lists whose keys are freshly-allocated const char*,
      InlineSkipList is stricly preferrable to SkipList.  This diff doesn't
      replace SkipList, however, because some of the use cases of SkipList in
      RocksDB are either character sequences that are not allocated at the
      same time as the skip list node allocation (for example
      hash_linklist_rep) or have different key types (for example
      write_batch_with_index).  Taking advantage of inline allocation for
      those cases is left to future work.
      
      The perf win is biggest for small values.  For single-threaded CPU-bound
      (32M fillrandom operations with no WAL log) with 16 byte keys and 0 byte
      values, the db_bench perf goes from ~310k ops/sec to ~410k ops/sec.  For
      large values the improvement is less pronounced, but seems to be between
      5% and 10% on the same configuration.
      
      Test Plan: make check
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D51123
      9a9d4759
    • N
      InlineSkipList - part 2/3 · 52017295
      Nathan Bronson 提交于
      Summary:
      This diff is 2/3 in a sequence that introduces a skip list optimized
      for a key that is a freshly-allocated const char*.  The change is broken
      into pieces to make it easier to review.  This piece removes the Key
      template type, introduces the AllocateKey interface, and changes the
      unit test from using uint64_t as the Key type to using pointers to an 8
      byte blob.
      
      Test Plan: unit test
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D51285
      52017295
    • N
      InlineSkipList - part 1/3 · 78812ec6
      Nathan Bronson 提交于
      Summary:
      This diff is 1/3 in a sequence that introduces a skip list optimized for
      a key that is a freshly-allocated const char*.  The diff is broken into
      pieces to make it easier to review.  This piece only introduces the new
      type by copying the existing SkipList, with mechanical naming changes
      and reformatting.
      
      Test Plan: new unit test
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D51279
      78812ec6
  12. 24 11月, 2015 1 次提交
    • V
      Enable C4267 warning · 41b32c60
      Vasili Svirski 提交于
      * conversion from 'size_t' to 'type', by add static_cast
      
      Tested:
      * by build solution on Windows, Linux locally,
      * run tests
      * build CI system successful
      41b32c60
  13. 21 11月, 2015 3 次提交