1. 10 2月, 2016 2 次提交
  2. 06 2月, 2016 1 次提交
    • R
      Improve perf of Pessimistic Transaction expirations (and optimistic transactions) · 6f71d3b6
      reid horuff 提交于
      Summary:
      copy from task 8196669:
      
      1) Optimistic transactions do not support batching writes from different threads.
      2) Pessimistic transactions do not support batching writes if an expiration time is set.
      
      In these 2 cases, we currently do not do any write batching in DBImpl::WriteImpl() because there is a WriteCallback that could decide at the last minute to abort the write.  But we could support batching write operations with callbacks if we make sure to process the callbacks correctly.
      
      To do this, we would first need to modify write_thread.cc to stop preventing writes with callbacks from being batched together.  Then we would need to change DBImpl::WriteImpl() to call all WriteCallback's in a batch, only write the batches that succeed, and correctly set the state of each batch's WriteThread::Writer.
      
      Test Plan: Added test WriteWithCallbackTest to write_callback_test.cc which creates multiple client threads and verifies that writes are batched and executed properly.
      
      Reviewers: hermanlee4, anthony, ngbronson
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52863
      6f71d3b6
  3. 03 2月, 2016 1 次提交
    • A
      Eliminate duplicated property constants · 284aa613
      Andrew Kryczka 提交于
      Summary:
      Before this diff, there were duplicated constants to refer to properties (user-
      facing API had strings and InternalStats had an enum). I noticed these were
      inconsistent in terms of which constants are provided, names of constants, and
      documentation of constants. Overall it seemed annoying/error-prone to maintain
      these duplicated constants.
      
      So, this diff gets rid of InternalStats's constants and replaces them with a map
      keyed on the user-facing constant. The value in that map contains a function
      pointer to get the property value, so we don't need to do string matching while
      holding db->mutex_. This approach has a side benefit of making many small
      handler functions rather than a giant switch-statement.
      
      Test Plan: db_properties_test passes, running "make commit-prereq -j32"
      
      Reviewers: sdong, yhchiang, kradhakrishnan, IslamAbdelRahman, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D53253
      284aa613
  4. 02 2月, 2016 1 次提交
  5. 01 2月, 2016 1 次提交
  6. 30 1月, 2016 1 次提交
    • V
      Add options.base_background_compactions as a number of compaction threads for low compaction debt · 3b2a1ddd
      Venkatesh Radhakrishnan 提交于
      Summary:
      If options.base_background_compactions is given, we try to schedule number of compactions not existing this number, only when L0 files increase to certain number, or pending compaction bytes more than certain threshold, we schedule compactions based on options.max_background_compactions.
      
      The watermarks are calculated based on slowdown thresholds.
      
      Test Plan:
      Add new test cases in column_family_test.
      Adding more unit tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D53409
      3b2a1ddd
  7. 29 1月, 2016 1 次提交
  8. 27 1月, 2016 3 次提交
  9. 26 1月, 2016 1 次提交
  10. 19 1月, 2016 2 次提交
    • D
      Make alloca.h optional · 3f12e16f
      David Bernard 提交于
      3f12e16f
    • D
      Changes for build on solaris · d78c6b28
      David Bernard 提交于
      Makefile adjust paths for solaris build
      Makefile enable _GLIBCXX_USE_C99 so that std::to_string is available
      db_compaction_test.cc Initialise a variable to avoid a compilation error
      db_impl.cc Include <alloca.h>
      db_test.cc Include <alloca.h>
      Environment.java recognise solaris envrionment
      options_bulder.cc Make log unambiguous
      geodb_impl.cc Make log and floor unambiguous
      d78c6b28
  11. 08 1月, 2016 1 次提交
    • V
      DeleteFilesInRange: Mark files to be deleted as being compacted before applying change · 7ece10ec
      Venkatesh Radhakrishnan 提交于
      Summary:
      While running the myrocks regression suite, I found that while
      dropping a table soon after inserting rows into it resulted in an
      assertion failure in CheckConsistencyForDeletes for not finding
      a file which was recently added or moved. Marking the files to be
      deleted as being compacted before calling LogAndApplyChange
      fixed the assertion failures.
      
      Test Plan: DBCompactionTest.DeleteFileRange
      
      Reviewers: IslamAbdelRahman, anthony, yhchiang, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, yoshinorim, leveldb
      
      Differential Revision: https://reviews.facebook.net/D52599
      7ece10ec
  12. 07 1月, 2016 1 次提交
    • R
      Optimize GetLatestSequenceForKey · da032495
      Reid Horuff 提交于
      Summary: DBImpl::GetLatestSequenceForKey() can do memcpy's to load a value that will never be used.  This can be optimized by changing all the Get() functions called to optionally not fetch the value (and only fetch the sequencenumber).
      
      Test Plan: optimistic_transaction_test and transaction_test
      
      Reviewers: anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D52227
      da032495
  13. 05 1月, 2016 1 次提交
  14. 31 12月, 2015 1 次提交
  15. 30 12月, 2015 1 次提交
  16. 29 12月, 2015 2 次提交
  17. 26 12月, 2015 1 次提交
    • N
      support for concurrent adds to memtable · 7d87f027
      Nathan Bronson 提交于
      Summary:
      This diff adds support for concurrent adds to the skiplist memtable
      implementations.  Memory allocation is made thread-safe by the addition of
      a spinlock, with small per-core buffers to avoid contention.  Concurrent
      memtable writes are made via an additional method and don't impose a
      performance overhead on the non-concurrent case, so parallelism can be
      selected on a per-batch basis.
      
      Write thread synchronization is an increasing bottleneck for higher levels
      of concurrency, so this diff adds --enable_write_thread_adaptive_yield
      (default off).  This feature causes threads joining a write batch
      group to spin for a short time (default 100 usec) using sched_yield,
      rather than going to sleep on a mutex.  If the timing of the yield calls
      indicates that another thread has actually run during the yield then
      spinning is avoided.  This option improves performance for concurrent
      situations even without parallel adds, although it has the potential to
      increase CPU usage (and the heuristic adaptation is not yet mature).
      
      Parallel writes are not currently compatible with
      inplace updates, update callbacks, or delete filtering.
      Enable it with --allow_concurrent_memtable_write (and
      --enable_write_thread_adaptive_yield).  Parallel memtable writes
      are performance neutral when there is no actual parallelism, and in
      my experiments (SSD server-class Linux and varying contention and key
      sizes for fillrandom) they are always a performance win when there is
      more than one thread.
      
      Statistics are updated earlier in the write path, dropping the number
      of DB mutex acquisitions from 2 to 1 for almost all cases.
      
      This diff was motivated and inspired by Yahoo's cLSM work.  It is more
      conservative than cLSM: RocksDB's write batch group leader role is
      preserved (along with all of the existing flush and write throttling
      logic) and concurrent writers are blocked until all memtable insertions
      have completed and the sequence number has been advanced, to preserve
      linearizability.
      
      My test config is "db_bench -benchmarks=fillrandom -threads=$T
      -batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
      -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
      -disable_auto_compactions --max_write_buffer_number=8
      -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      --block_size=16384 --allow_concurrent_memtable_write" on a two-socket
      Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive.  With 1
      thread I get ~440Kops/sec.  Peak performance for 1 socket (numactl
      -N1) is slightly more than 1Mops/sec, at 16 threads.  Peak performance
      across both sockets happens at 30 threads, and is ~900Kops/sec, although
      with fewer threads there is less performance loss when the system has
      background work.
      
      Test Plan:
      1. concurrent stress tests for InlineSkipList and DynamicBloom
      2. make clean; make check
      3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
      4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
      5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
      6. make clean; OPT=-DROCKSDB_LITE make check
      7. verify no perf regressions when disabled
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D50589
      7d87f027
  18. 23 12月, 2015 1 次提交
  19. 22 12月, 2015 1 次提交
    • A
      add call to install superversion and schedule work in enableautocompactions · 33e09c0e
      Alex Yang 提交于
      Summary:
      This patch fixes https://github.com/facebook/mysql-5.6/issues/121
      
      There is a recent change in rocksdb to disable auto compactions on startup: https://reviews.facebook.net/D51147. However, there is a small timing window where a column family needs to be compacted and schedules a compaction, but the scheduled compaction fails when it checks the disable_auto_compactions setting. The expectation is once the application is ready, it will call EnableAutoCompactions() to allow new compactions to go through. However, if the Column family is stalled because L0 is full, and no writes can go through, it is possible the column family may never have a new compaction request get scheduled. EnableAutoCompaction() should probably schedule an new flush and compaction event when it resets disable_auto_compaction.
      
      Using InstallSuperVersionAndScheduleWork, we call SchedulePendingFlush,
      SchedulePendingCompaction, as well as MaybeScheduleFlushOrcompaction on all the
      column families to avoid the situation above.
      
      This is still a first pass for feedback.
      Could also just call SchedePendingFlush and SchedulePendingCompaction directly.
      
      Test Plan:
      Run on Asan build
      cd _build-5.6-ASan/ && ./mysql-test/mtr --mem --big --testcase-timeout=36000 --suite-timeout=12000 --parallel=16 --suite=rocksdb,rocksdb_rpl,rocksdb_sys_vars --mysqld=--default-storage-engine=rocksdb --mysqld=--skip-innodb --mysqld=--default-tmp-storage-engine=MyISAM --mysqld=--rocksdb rocksdb_rpl.rpl_rocksdb_stress_crash --repeat=1000
      
      Ensure that it no longer hangs during the test.
      
      Reviewers: hermanlee4, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, yhchiang, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51747
      33e09c0e
  20. 18 12月, 2015 1 次提交
  21. 17 12月, 2015 1 次提交
    • I
      Introduce ReadOptions::pin_data (support zero copy for keys) · aececc20
      Islam AbdelRahman 提交于
      Summary:
      This patch update the Iterator API to introduce new functions that allow users to keep the Slices returned by key() valid as long as the Iterator is not deleted
      
      ReadOptions::pin_data : If true keep loaded blocks in memory as long as the iterator is not deleted
      Iterator::IsKeyPinned() : If true, this mean that the Slice returned by key() is valid as long as the iterator is not deleted
      
      Also add a new option BlockBasedTableOptions::use_delta_encoding to allow users to disable delta_encoding if needed.
      
      Benchmark results (using https://phabricator.fb.com/P20083553)
      
      ```
      // $ du -h /home/tec/local/normal.4K.Snappy/db10077
      // 6.1G    /home/tec/local/normal.4K.Snappy/db10077
      
      // $ du -h /home/tec/local/zero.8K.LZ4/db10077
      // 6.4G    /home/tec/local/zero.8K.LZ4/db10077
      
      // Benchmarks for shard db10077
      // _build/opt/rocks/benchmark/rocks_copy_benchmark \
      //      --normal_db_path="/home/tec/local/normal.4K.Snappy/db10077" \
      //      --zero_db_path="/home/tec/local/zero.8K.LZ4/db10077"
      
      // First run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                                 1.73s  576.97m
      // BM_StringPiece                                   103.74%      1.67s  598.55m
      // ============================================================================
      // Match rate : 1000000 / 1000000
      
      // Second run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                              611.99ms     1.63
      // BM_StringPiece                                   203.76%   300.35ms     3.33
      // ============================================================================
      // Match rate : 1000000 / 1000000
      ```
      
      Test Plan: Unit tests
      
      Reviewers: sdong, igor, anthony, yhchiang, rven
      
      Reviewed By: rven
      
      Subscribers: dhruba, lovro, adsharma
      
      Differential Revision: https://reviews.facebook.net/D48999
      aececc20
  22. 16 12月, 2015 1 次提交
    • G
      Fix minor bugs in delete operator, snprintf, and size_t usage · 97265f5f
      Gunnar Kudrjavets 提交于
      Summary:
      List of changes:
      
      1) Fix the snprintf() usage in cases where wrong variable was used to determine the output buffer size.
      
      2) Remove unnecessary checks before calling delete operator.
      
      3) Increase code correctness by using size_t type when getting vector's size.
      
      4) Unify the coding style by removing namespace::std usage at the top of the file to confirm to the majority usage.
      
      5) Fix various lint errors pointed out by 'arc lint'.
      
      Test Plan:
      Code review and build:
      
      git diff
      make clean
      make -j 32 commit-prereq
      arc lint
      
      Reviewers: kradhakrishnan, sdong, rven, anthony, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51849
      97265f5f
  23. 15 12月, 2015 1 次提交
    • V
      Running manual compactions in parallel with other automatic or manual... · 030215bf
      Venkatesh Radhakrishnan 提交于
      Running manual compactions in parallel with other automatic or manual compactions in restricted cases
      
      Summary:
      This diff provides a framework for doing manual
      compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
      BackgroundCompactions, so that RunManualCompactions can be reentrant.
      Parallelism is controlled by the two routines
      ConflictingManualCompaction to allow/disallow new parallel/manual
      compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
      I will be adding more tests later.
      
      Test Plan: Rocksdb regression + new tests + valgrind
      
      Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D47973
      030215bf
  24. 12 12月, 2015 2 次提交
    • D
      Enable MS compiler warning c4244. · 236fe21c
      Dmitri Smirnov 提交于
        Mostly due to the fact that there are differences in sizes of int,long
        on 64 bit systems vs GNU.
      236fe21c
    • A
      Use SST files for Transaction conflict detection · 3bfd3d39
      agiardullo 提交于
      Summary:
      Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.
      
      With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).
      
      Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.
      
      Test Plan: unit tests, db bench
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D50475
      3bfd3d39
  25. 11 12月, 2015 1 次提交
    • A
      Change SingleDelete to support conflict checking · 9e446290
      agiardullo 提交于
      Summary: For Transactions, we want to start using the SST files to do write conflict checking.  To do this, we need to make sure that compaction never removes all writes if an earlier snapshot exists.  So I had to change the way we process SingleDeletes to sometimes leave a SingleDelete behind when we encounter a Put followed by a SingleDelete.  See the comments in this diff for a more detailed explanation.
      
      Test Plan: added more unit tests
      
      Reviewers: rven, igor, kradhakrishnan, IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D50295
      9e446290
  26. 09 12月, 2015 3 次提交
  27. 08 12月, 2015 3 次提交
    • A
      Support marking snapshots for write-conflict checking · ec704aaf
      agiardullo 提交于
      Summary:
      D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.
      
      This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.
      
      This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).
      
      Test Plan: no behavior change, ran existing tests
      
      Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51183
      ec704aaf
    • S
      Revert "Fix a race condition in persisting options" · f307036b
      sdong 提交于
      This reverts commit 2fa3ed51. It breaks RocksDB lite build
      f307036b
    • Y
      Fix a race condition in persisting options · 2fa3ed51
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch fix a race condition in persisting options which will cause a crash when:
      
      * Thread A obtain cf options and start to persist options based on that cf options.
      * Thread B kicks in and finish DropColumnFamily and delete cf_handle.
      * Thread A wakes up and tries to finish the persisting options and crashes.
      
      Test Plan: Add a test in column_family_test that can reproduce the crash
      
      Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51609
      2fa3ed51
  28. 04 12月, 2015 1 次提交
    • A
      added public api to schedule flush/compaction, code to prevent race with db::open · e8180f99
      Alex Yang 提交于
      Summary:
      Fixes T8781168.
      
      Added a new function EnableAutoCompactions in db.h to be publicly
      avialable.  This allows compaction to be re-enabled after disabling it via
      SetOptions
      
      Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
      Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
      prevent race condition.
      
      Test Plan:
      Ran make all check
      
      verified fix on myrocks side:
      was able to reproduce the seg fault with
      ../tools/mysqltest.sh --mem --force rocksdb.drop_table
      
      method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
      assigned in transaction_db_impl.cc:
        DB::Open(db_options, dbname, column_families_copy, handles, &db);
        clock_t goal = (60000 * 10) + clock();
        while (goal > clock());
        ...dbptr(aka rdb) gets assigned below
      
      verified my changes fixed the issue.
      
      Also added unit test 'ToggleAutoCompaction' in transaction_test.cc
      
      Reviewers: hermanlee4, anthony
      
      Reviewed By: anthony
      
      Subscribers: alex, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51147
      e8180f99
  29. 01 12月, 2015 1 次提交
    • S
      DB to only flush the column family with the largest memtable while... · db320b1b
      sdong 提交于
      DB to only flush the column family with the largest memtable while option.db_write_buffer_size is hit
      
      Summary: When option.db_write_buffer_size is hit, we currently flush all column families. Move to flush the column family with the largest active memt table instead. In this way, we can avoid too many small files in some cases.
      
      Test Plan: Modify test DBTest.SharedWriteBuffer to work with the updated behavior
      
      Reviewers: kradhakrishnan, yhchiang, rven, anthony, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: march, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51291
      db320b1b
  30. 17 11月, 2015 1 次提交