1. 26 12月, 2015 2 次提交
    • N
      support for concurrent adds to memtable · 7d87f027
      Nathan Bronson 提交于
      Summary:
      This diff adds support for concurrent adds to the skiplist memtable
      implementations.  Memory allocation is made thread-safe by the addition of
      a spinlock, with small per-core buffers to avoid contention.  Concurrent
      memtable writes are made via an additional method and don't impose a
      performance overhead on the non-concurrent case, so parallelism can be
      selected on a per-batch basis.
      
      Write thread synchronization is an increasing bottleneck for higher levels
      of concurrency, so this diff adds --enable_write_thread_adaptive_yield
      (default off).  This feature causes threads joining a write batch
      group to spin for a short time (default 100 usec) using sched_yield,
      rather than going to sleep on a mutex.  If the timing of the yield calls
      indicates that another thread has actually run during the yield then
      spinning is avoided.  This option improves performance for concurrent
      situations even without parallel adds, although it has the potential to
      increase CPU usage (and the heuristic adaptation is not yet mature).
      
      Parallel writes are not currently compatible with
      inplace updates, update callbacks, or delete filtering.
      Enable it with --allow_concurrent_memtable_write (and
      --enable_write_thread_adaptive_yield).  Parallel memtable writes
      are performance neutral when there is no actual parallelism, and in
      my experiments (SSD server-class Linux and varying contention and key
      sizes for fillrandom) they are always a performance win when there is
      more than one thread.
      
      Statistics are updated earlier in the write path, dropping the number
      of DB mutex acquisitions from 2 to 1 for almost all cases.
      
      This diff was motivated and inspired by Yahoo's cLSM work.  It is more
      conservative than cLSM: RocksDB's write batch group leader role is
      preserved (along with all of the existing flush and write throttling
      logic) and concurrent writers are blocked until all memtable insertions
      have completed and the sequence number has been advanced, to preserve
      linearizability.
      
      My test config is "db_bench -benchmarks=fillrandom -threads=$T
      -batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
      -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
      -disable_auto_compactions --max_write_buffer_number=8
      -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      --block_size=16384 --allow_concurrent_memtable_write" on a two-socket
      Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive.  With 1
      thread I get ~440Kops/sec.  Peak performance for 1 socket (numactl
      -N1) is slightly more than 1Mops/sec, at 16 threads.  Peak performance
      across both sockets happens at 30 threads, and is ~900Kops/sec, although
      with fewer threads there is less performance loss when the system has
      background work.
      
      Test Plan:
      1. concurrent stress tests for InlineSkipList and DynamicBloom
      2. make clean; make check
      3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
      4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
      5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
      6. make clean; OPT=-DROCKSDB_LITE make check
      7. verify no perf regressions when disabled
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D50589
      7d87f027
    • S
      DBTest.HardLimit use special memtable · 5b2587b5
      sdong 提交于
      Summary: DBTest.HardLimit fails in appveyor build. Use special mem table to make the test behavior depends less on platform
      
      Test Plan: Run the test with JEMALLOC both on and off.
      
      Reviewers: yhchiang, kradhakrishnan, rven, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52317
      5b2587b5
  2. 24 12月, 2015 13 次提交
  3. 23 12月, 2015 5 次提交
  4. 22 12月, 2015 6 次提交
    • Z
      Fix computation of size of last sub-compaction · 728f944f
      Zhipeng Jia 提交于
      728f944f
    • I
      Merge pull request #863 from zhangyybuaa/fix_hdfs_error · 8ac7fb83
      Igor Canadi 提交于
      Fix build error with hdfs
      8ac7fb83
    • I
      Merge pull request #894 from zhipeng-jia/develop · e53e8219
      Igor Canadi 提交于
      Sorting std::vector instead of using std::set
      e53e8219
    • Z
      Sorting std::vector instead of using std::set · e0abec15
      Zhipeng Jia 提交于
      e0abec15
    • A
      add call to install superversion and schedule work in enableautocompactions · 33e09c0e
      Alex Yang 提交于
      Summary:
      This patch fixes https://github.com/facebook/mysql-5.6/issues/121
      
      There is a recent change in rocksdb to disable auto compactions on startup: https://reviews.facebook.net/D51147. However, there is a small timing window where a column family needs to be compacted and schedules a compaction, but the scheduled compaction fails when it checks the disable_auto_compactions setting. The expectation is once the application is ready, it will call EnableAutoCompactions() to allow new compactions to go through. However, if the Column family is stalled because L0 is full, and no writes can go through, it is possible the column family may never have a new compaction request get scheduled. EnableAutoCompaction() should probably schedule an new flush and compaction event when it resets disable_auto_compaction.
      
      Using InstallSuperVersionAndScheduleWork, we call SchedulePendingFlush,
      SchedulePendingCompaction, as well as MaybeScheduleFlushOrcompaction on all the
      column families to avoid the situation above.
      
      This is still a first pass for feedback.
      Could also just call SchedePendingFlush and SchedulePendingCompaction directly.
      
      Test Plan:
      Run on Asan build
      cd _build-5.6-ASan/ && ./mysql-test/mtr --mem --big --testcase-timeout=36000 --suite-timeout=12000 --parallel=16 --suite=rocksdb,rocksdb_rpl,rocksdb_sys_vars --mysqld=--default-storage-engine=rocksdb --mysqld=--skip-innodb --mysqld=--default-tmp-storage-engine=MyISAM --mysqld=--rocksdb rocksdb_rpl.rpl_rocksdb_stress_crash --repeat=1000
      
      Ensure that it no longer hangs during the test.
      
      Reviewers: hermanlee4, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, yhchiang, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51747
      33e09c0e
    • S
      Merge pull request #893 from zhipeng-jia/develop · 22c6b50e
      Siying Dong 提交于
      Fix clang warning regarding implicit conversion
      22c6b50e
  5. 21 12月, 2015 1 次提交
  6. 19 12月, 2015 3 次提交
  7. 18 12月, 2015 7 次提交
    • N
      Fix use-after free in db_bench · a4838239
      Nathan Bronson 提交于
      Test Plan: valgrind db_bench
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D52101
      a4838239
    • I
      Merge pull request #890 from zhipeng-jia/develop · bf8ffc1d
      Igor Canadi 提交于
      fix typo: sr to picking_sr
      bf8ffc1d
    • Z
      fix typo: sr to picking_sr · 131f7ddf
      Zhipeng Jia 提交于
      131f7ddf
    • S
      db_bench: --soft_pending_compaction_bytes_limit should set... · c37729a6
      sdong 提交于
      db_bench: --soft_pending_compaction_bytes_limit should set options.soft_pending_compaction_bytes_limit
      
      Summary: Fix a bug that options.soft_pending_compaction_bytes_limit is not actually set with --soft_pending_compaction_bytes_limit
      
      Test Plan: Run db_bench with this parameter and make sure the parameter is set correctly.
      
      Reviewers: anthony, kradhakrishnan, yhchiang, IslamAbdelRahman, igor, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52125
      c37729a6
    • V
      Add signalall after removing item from manual_compaction deque · 7b12ae97
      Venkatesh Radhakrishnan 提交于
      Summary:
      When there are waiting manual compactions, we need to signal
      them after removing the current manual compaction from the deque.
      
      Test Plan: ColumnFamilytTest.SameCFManualManualCommaction
      
      Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D52119
      7b12ae97
    • S
      Slowdown when writing to the last write buffer · d72b3177
      sdong 提交于
      Summary: Now if inserting to mem table is much faster than writing to files, there is no mechanism users can rely on to avoid stopping for reaching options.max_write_buffer_number. With the commit, if there are more than four maximum write buffers configured, we slow down to the rate of options.delayed_write_rate while we reach the last one.
      
      Test Plan:
      1. Add a new unit test.
      2. Run db_bench with
      
      ./db_bench --benchmarks=fillrandom --num=10000000 --max_background_flushes=6 --batch_size=32 -max_write_buffer_number=4 --delayed_write_rate=500000 --statistics
      
      based on hard drive and see stopping is avoided with the commit.
      
      Reviewers: yhchiang, IslamAbdelRahman, anthony, rven, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52047
      d72b3177
    • V
      Add documentation for unschedFunction · 6b2a3ac9
      Venkatesh Radhakrishnan 提交于
      Summary:
      Documenting the unschedFunction parameter to Schedule as
      requested by Michael Kolupaev.
      
      Test Plan: build, unit test
      
      Reviewers: sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: kolmike, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52089
      6b2a3ac9
  8. 17 12月, 2015 3 次提交
    • S
      ZSTD to use CompressionOptions.level · 167fb919
      sdong 提交于
      Summary: Now ZSTD hard code level 1. Change it to use the compression level setting.
      
      Test Plan: Run it with hacked codes of sst_dump and show ZSTD compression sizes with different levels.
      
      Reviewers: rven, anthony, yhchiang, kradhakrishnan, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52041
      167fb919
    • I
      Bump version to 4.4 · 32ff05e9
      Islam AbdelRahman 提交于
      Summary: Bump version to 4.4
      
      Test Plan: none
      
      Reviewers: sdong, rven, yhchiang, anthony, kradhakrishnan
      
      Reviewed By: kradhakrishnan
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D52035
      32ff05e9
    • I
      Introduce ReadOptions::pin_data (support zero copy for keys) · aececc20
      Islam AbdelRahman 提交于
      Summary:
      This patch update the Iterator API to introduce new functions that allow users to keep the Slices returned by key() valid as long as the Iterator is not deleted
      
      ReadOptions::pin_data : If true keep loaded blocks in memory as long as the iterator is not deleted
      Iterator::IsKeyPinned() : If true, this mean that the Slice returned by key() is valid as long as the iterator is not deleted
      
      Also add a new option BlockBasedTableOptions::use_delta_encoding to allow users to disable delta_encoding if needed.
      
      Benchmark results (using https://phabricator.fb.com/P20083553)
      
      ```
      // $ du -h /home/tec/local/normal.4K.Snappy/db10077
      // 6.1G    /home/tec/local/normal.4K.Snappy/db10077
      
      // $ du -h /home/tec/local/zero.8K.LZ4/db10077
      // 6.4G    /home/tec/local/zero.8K.LZ4/db10077
      
      // Benchmarks for shard db10077
      // _build/opt/rocks/benchmark/rocks_copy_benchmark \
      //      --normal_db_path="/home/tec/local/normal.4K.Snappy/db10077" \
      //      --zero_db_path="/home/tec/local/zero.8K.LZ4/db10077"
      
      // First run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                                 1.73s  576.97m
      // BM_StringPiece                                   103.74%      1.67s  598.55m
      // ============================================================================
      // Match rate : 1000000 / 1000000
      
      // Second run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                              611.99ms     1.63
      // BM_StringPiece                                   203.76%   300.35ms     3.33
      // ============================================================================
      // Match rate : 1000000 / 1000000
      ```
      
      Test Plan: Unit tests
      
      Reviewers: sdong, igor, anthony, yhchiang, rven
      
      Reviewed By: rven
      
      Subscribers: dhruba, lovro, adsharma
      
      Differential Revision: https://reviews.facebook.net/D48999
      aececc20