1. 02 3月, 2016 1 次提交
  2. 03 2月, 2016 1 次提交
    • A
      Eliminate duplicated property constants · 284aa613
      Andrew Kryczka 提交于
      Summary:
      Before this diff, there were duplicated constants to refer to properties (user-
      facing API had strings and InternalStats had an enum). I noticed these were
      inconsistent in terms of which constants are provided, names of constants, and
      documentation of constants. Overall it seemed annoying/error-prone to maintain
      these duplicated constants.
      
      So, this diff gets rid of InternalStats's constants and replaces them with a map
      keyed on the user-facing constant. The value in that map contains a function
      pointer to get the property value, so we don't need to do string matching while
      holding db->mutex_. This approach has a side benefit of making many small
      handler functions rather than a giant switch-statement.
      
      Test Plan: db_properties_test passes, running "make commit-prereq -j32"
      
      Reviewers: sdong, yhchiang, kradhakrishnan, IslamAbdelRahman, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D53253
      284aa613
  3. 26 1月, 2016 1 次提交
  4. 22 1月, 2016 2 次提交
  5. 26 12月, 2015 1 次提交
    • N
      support for concurrent adds to memtable · 7d87f027
      Nathan Bronson 提交于
      Summary:
      This diff adds support for concurrent adds to the skiplist memtable
      implementations.  Memory allocation is made thread-safe by the addition of
      a spinlock, with small per-core buffers to avoid contention.  Concurrent
      memtable writes are made via an additional method and don't impose a
      performance overhead on the non-concurrent case, so parallelism can be
      selected on a per-batch basis.
      
      Write thread synchronization is an increasing bottleneck for higher levels
      of concurrency, so this diff adds --enable_write_thread_adaptive_yield
      (default off).  This feature causes threads joining a write batch
      group to spin for a short time (default 100 usec) using sched_yield,
      rather than going to sleep on a mutex.  If the timing of the yield calls
      indicates that another thread has actually run during the yield then
      spinning is avoided.  This option improves performance for concurrent
      situations even without parallel adds, although it has the potential to
      increase CPU usage (and the heuristic adaptation is not yet mature).
      
      Parallel writes are not currently compatible with
      inplace updates, update callbacks, or delete filtering.
      Enable it with --allow_concurrent_memtable_write (and
      --enable_write_thread_adaptive_yield).  Parallel memtable writes
      are performance neutral when there is no actual parallelism, and in
      my experiments (SSD server-class Linux and varying contention and key
      sizes for fillrandom) they are always a performance win when there is
      more than one thread.
      
      Statistics are updated earlier in the write path, dropping the number
      of DB mutex acquisitions from 2 to 1 for almost all cases.
      
      This diff was motivated and inspired by Yahoo's cLSM work.  It is more
      conservative than cLSM: RocksDB's write batch group leader role is
      preserved (along with all of the existing flush and write throttling
      logic) and concurrent writers are blocked until all memtable insertions
      have completed and the sequence number has been advanced, to preserve
      linearizability.
      
      My test config is "db_bench -benchmarks=fillrandom -threads=$T
      -batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
      -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
      -disable_auto_compactions --max_write_buffer_number=8
      -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      --block_size=16384 --allow_concurrent_memtable_write" on a two-socket
      Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive.  With 1
      thread I get ~440Kops/sec.  Peak performance for 1 socket (numactl
      -N1) is slightly more than 1Mops/sec, at 16 threads.  Peak performance
      across both sockets happens at 30 threads, and is ~900Kops/sec, although
      with fewer threads there is less performance loss when the system has
      background work.
      
      Test Plan:
      1. concurrent stress tests for InlineSkipList and DynamicBloom
      2. make clean; make check
      3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
      4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
      5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
      6. make clean; OPT=-DROCKSDB_LITE make check
      7. verify no perf regressions when disabled
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D50589
      7d87f027
  6. 18 12月, 2015 1 次提交
    • S
      Slowdown when writing to the last write buffer · d72b3177
      sdong 提交于
      Summary: Now if inserting to mem table is much faster than writing to files, there is no mechanism users can rely on to avoid stopping for reaching options.max_write_buffer_number. With the commit, if there are more than four maximum write buffers configured, we slow down to the rate of options.delayed_write_rate while we reach the last one.
      
      Test Plan:
      1. Add a new unit test.
      2. Run db_bench with
      
      ./db_bench --benchmarks=fillrandom --num=10000000 --max_background_flushes=6 --batch_size=32 -max_write_buffer_number=4 --delayed_write_rate=500000 --statistics
      
      based on hard drive and see stopping is avoided with the commit.
      
      Reviewers: yhchiang, IslamAbdelRahman, anthony, rven, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52047
      d72b3177
  7. 12 12月, 2015 1 次提交
  8. 10 12月, 2015 1 次提交
    • S
      Deprecate options.soft_rate_limit and add options.soft_pending_compaction_bytes_limit · 56e77f09
      sdong 提交于
      Summary: Deprecate options.soft_rate_limit, which is hard to tune, with options.soft_pending_compaction_bytes_limit, which would trigger the slowdown if estimated pending compaction bytes exceeds the threshold. The hope is to make it more striaght-forward to tune.
      
      Test Plan: Modify DBTest.SoftLimit to cover options.soft_pending_compaction_bytes_limit instead; run all unit tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, igor, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51117
      56e77f09
  9. 17 10月, 2015 1 次提交
  10. 18 9月, 2015 1 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
  11. 15 9月, 2015 2 次提交
    • S
      Add options.hard_pending_compaction_bytes_limit to stop writes if compaction lagging behind · 5de807ac
      sdong 提交于
      Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold.
      
      Test Plan: Add unit test DBTest.HardLimit
      
      Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D45999
      5de807ac
    • A
      Add counters for L0 stall while L0-L1 compaction is taking place · 03ddce9a
      Ari Ekmekji 提交于
      Summary:
      Although there are currently counters to keep track of the
      stall caused by having too many L0 files, there is no distinction as
      to whether when that stall occurs either (A) L0-L1 compaction is taking
      place to try and mitigate it, or (B) no L0-L1 compaction has been scheduled
      at the moment. This diff adds a counter for (A) so that the nature of L0
      stalls can be better understood.
      
      Test Plan: make all && make check
      
      Reviewers: sdong, igor, anthony, noetzli, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D46749
      03ddce9a
  12. 03 9月, 2015 1 次提交
    • A
      Unified maps with Comparator for sorting, other cleanup · 3c9cef1e
      Andres Noetzli 提交于
      Summary:
      This diff is a collection of cleanups that were initially part of D43179.
      Additionally it adds a unified way of defining key-value maps that use a
      Comparator for sorting (this was previously implemented in four different
      places).
      
      Test Plan: make clean check all
      
      Reviewers: rven, anthony, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45993
      3c9cef1e
  13. 26 8月, 2015 1 次提交
    • Y
      Expose per-level aggregated table properties via GetProperty() · 6996de87
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch adds "rocksdb.aggregated-table-properties"
      and "rocksdb.aggregated-table-properties-at-levelN", the former
      returns the aggreated table properties of a column family,
      while the later returns the aggregated table properties
      of the specified level N.
      
      Test Plan: Added tests in db_test
      
      Reviewers: igor, sdong, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45087
      6996de87
  14. 25 8月, 2015 1 次提交
  15. 21 8月, 2015 2 次提交
    • S
      Add a counter about estimated pending compaction bytes · 07d2d341
      sdong 提交于
      Summary:
      Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
      In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.
      
      Test Plan: Add unit tests
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44205
      07d2d341
    • I
      Total SST files size DB Property · 027ca5b2
      Islam AbdelRahman 提交于
      Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions
      
      Test Plan: Unittests for the new property
      
      Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44799
      027ca5b2
  16. 20 8月, 2015 1 次提交
    • Y
      Introduce GetIntProperty("rocksdb.size-all-mem-tables") · df79eafc
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, GetIntProperty("rocksdb.cur-size-all-mem-tables") only returns
      the memory usage by those memtables which have not yet been flushed.
      
      This patch introduces GetIntProperty("rocksdb.size-all-mem-tables"),
      which includes the memory usage by all the memtables, includes those
      have been flushed but pinned by iterators.
      
      Test Plan: Added a test in db_test
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D44229
      df79eafc
  17. 15 8月, 2015 1 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
  18. 12 8月, 2015 1 次提交
    • A
      Removing duplicate code in db_bench/db_stress, fixing typos · 4249f159
      Andres Notzli 提交于
      Summary:
      While working on single delete support for db_bench, I realized that
      db_bench/db_stress contain a bunch of duplicate code related to
      copmression and found some typos. This patch removes duplicate code,
      typos and a redundant #ifndef in internal_stats.cc.
      
      Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress
      
      Reviewers: yhchiang, sdong, rven, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43965
      4249f159
  19. 22 7月, 2015 1 次提交
    • A
      Report live data size estimate · 06aebca5
      Andres Notzli 提交于
      Summary:
      Fixes T6548822. Added a new function for estimating the size of the live data
      as proposed in the task. The value can be accessed through the property
      rocksdb.estimate-live-data-size.
      
      Test Plan:
      There are two unit tests in version_set_test and a simple test in db_test.
      make version_set_test && ./version_set_test;
      make db_test && ./db_test gtest_filter=GetProperty
      
      Reviewers: rven, igor, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41493
      06aebca5
  20. 18 6月, 2015 1 次提交
    • Y
      Fixed a bug of CompactionStats in multi-level universal compaction case · bb1c74ce
      Yueh-Hsuan Chiang 提交于
      Summary:
      Universal compaction can involves in multiple levels.  However,
      the current implementation of bytes_readn and bytes_readnp1
      (and some other stats with postfix `n` and `np1`) assumes compaction
      can only have two levels.
      
      This patch fixes this bug and redefines bytes_readn and bytes_readnp1:
      * bytes_readnp1: the number of bytes read in the compaction output level.
      * bytes_readn: the total number of bytes read minus bytes_readnp1
      
      Test Plan: Add a test in compaction_job_stats_test
      
      Reviewers: igor, sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40239
      bb1c74ce
  21. 29 5月, 2015 1 次提交
    • A
      Support saving history in memtable_list · c8153510
      agiardullo 提交于
      Summary:
      For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
      
      After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
      
      This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
      
      However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
      
      Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37443
      c8153510
  22. 23 5月, 2015 1 次提交
  23. 20 5月, 2015 1 次提交
    • M
      Add --wal_bytes_per_sync for db_bench and more IO stats · 944043d6
      Mark Callaghan 提交于
      Summary:
      See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats.
      I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates
      can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed
      using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job
      (IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes.
      The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the
      Interval compaction line uses the interval uptime. So these Cumalative & Interval
      compaction IO rates cannot be compared to the per-level IO rates. But both forms of
      the rates are useful for debugging perf.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D38667
      944043d6
  24. 02 4月, 2015 1 次提交
  25. 31 3月, 2015 1 次提交
    • M
      Make the benchmark scripts configurable and add tests · 99ec2412
      Mark Callaghan 提交于
      Summary:
      This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
      ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
      makes the duration per-test configurable and refactors the test scripts.
      
      Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
      the writer thread does Merge rather than Put.
      
      Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
      it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
      Sometimes automation and humans want different format.
      
      Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
      stats in the output at test end.
      
      Adds the average ingest rate to compaction IO stats. Output now looks like:
      https://gist.github.com/mdcallag/2bd64d18be1b93adc494
      
      More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1
      
      For benchmark.sh changes default RocksDB configuration to reduce stalls:
      * min_level_to_compress from 2 to 3
      * hard_rate_limit from 2 to 3
      * max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
      * L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)
      
      Task ID: #6596829
      
      Blame Rev:
      
      Test Plan:
      run tools/run_flash_bench.sh
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D36075
      99ec2412
  26. 28 3月, 2015 1 次提交
  27. 19 3月, 2015 1 次提交
  28. 17 3月, 2015 1 次提交
  29. 15 3月, 2015 2 次提交
    • I
      Make RecordIn/RecordOut human readable · c6967a1a
      Igor Canadi 提交于
      Summary: I had hard time understanding these big numbers. Here's how the output looks like now: https://gist.github.com/igorcanadi/4c39c17685049584a992
      
      Test Plan: db_bench
      
      Reviewers: sdong, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D35073
      c6967a1a
    • M
      Stop printing per-level stall times. · c8da6703
      Mark Callaghan 提交于
      Summary:
      Per-level stall times are the suggested stall time, not the actual stall time so this change stops printing them
      both in the per-level output lines and in the summary. Also changed output for total stall time to include units
      in all cases. The new output looks like:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)    RecordIn   RecordDrop
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     4/1          7   0.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     12.9        50       352    0.141        882            0            0
        L1     5/0          9   0.9      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
        L2    54/0         99   1.0      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
        L3   289/0        527   0.5      0.0     0.0      0.0       0.0      0.0       0.5   0.0      0.0      0.0         0         0    0.000          0            0            0
       Sum   352/1        642   0.0      0.0     0.0      0.0       0.6      0.6       1.7   1.0      0.0     12.9        50       352    0.141        882            0            0
       Int     0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     15.5         0         3    0.118          7            0            0
      Flush(GB): accumulative 0.627, interval 0.005
      Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 882 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard
      
      Task ID: #6493861
      
      Blame Rev:
      
      Test Plan:
      run db_bench, look at output
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D35085
      c8da6703
  30. 13 3月, 2015 1 次提交
  31. 04 3月, 2015 1 次提交
    • Y
      Fix a bug in stall time counter. Improve its output format. · 694988b6
      Yueh-Hsuan Chiang 提交于
      Summary: Fix a bug in stall time counter.  Improve its output format.
      
      Test Plan:
      export ROCKSDB_TESTS=Timeout
      ./db_test
      
      ./db_bench --benchmarks=fillrandom --stats_interval=10000 --statistics=true --stats_per_interval=1 --num=1000000 --threads=4 --level0_stop_writes_trigger=3 --level0_slowdown_writes_trigger=2
      
      sample output:
          Uptime(secs): 35.8 total, 0.0 interval
          Cumulative writes: 359590 writes, 359589 keys, 183047 batches, 2.0 writes per batch, 0.04 GB user ingest, stall seconds: 1786.008 ms
          Cumulative WAL: 359591 writes, 183046 syncs, 1.96 writes per sync, 0.04 GB written
          Interval writes: 253 writes, 253 keys, 128 batches, 2.0 writes per batch, 0.0 MB user ingest, stall time: 0 us
          Interval WAL: 253 writes, 128 syncs, 1.96 writes per sync, 0.00 MB written
      
      Reviewers: MarkCallaghan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D34275
      694988b6
  32. 03 3月, 2015 1 次提交
    • I
      options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size... · db037393
      Igor Canadi 提交于
      options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
      
      Summary:
      When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.
      
      In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.
      
      Test Plan: New unit tests and pass tests suites including valgrind.
      
      Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo
      
      Reviewed By: ikabiljo
      
      Subscribers: yoshinorim, ikabiljo, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D31437
      db037393
  33. 20 2月, 2015 1 次提交
  34. 10 1月, 2015 1 次提交
    • S
      DB Stats Dump to print total stall time · 9132e52e
      sdong 提交于
      Summary:
      Add printing of stall time in DB Stats:
      
      Sample outputs:
      
      ** DB Stats **
      Uptime(secs): 53.2 total, 1.7 interval
      Cumulative writes: 625940 writes, 625939 keys, 625940 batches, 1.0 writes per batch, 0.49 GB user ingest, stall micros: 50691070
      Cumulative WAL: 625940 writes, 625939 syncs, 1.00 writes per sync, 0.49 GB written
      Interval writes: 10859 writes, 10859 keys, 10859 batches, 1.0 writes per batch, 8.7 MB user ingest, stall micros: 1692319
      Interval WAL: 10859 writes, 10859 syncs, 1.00 writes per sync, 0.01 MB written
      
      Test Plan:
      make all check
      verify printing using db_bench
      
      Reviewers: igor, yhchiang, rven, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D31239
      9132e52e
  35. 06 12月, 2014 1 次提交
  36. 04 12月, 2014 1 次提交
    • M
      Add Moved(GB) to Compaction IO stats · 32a0a038
      Mark Callaghan 提交于
      Summary:
      Adds counter for bytes moved (files pushed down a level rather than compacted) to compaction
      IO stats as Moved(GB). From the output removed these infrequently used columns: RW-Amp, Rn(cnt), Rnp1(cnt),
      Wnp1(cnt), Wnew(cnt).
      Example old output:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) RecordIn RecordDrop
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     0/0          0   0.0      0.0     0.0      0.0    2130.8   2130.8    0.0   0.0      0.0    109.1        0         0         0         0      20002     25068    0.798      28.75     182059    0.16       0          0
        L1   142/0        509   1.0   4618.5  2036.5   2582.0    4602.1   2020.2    4.5   2.3     88.5     88.1    24220    701246   1215528    514282      53466      4229   12.643       0.00          0    0.002032745988  300688729
      
      Example new output:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)     RecordIn   RecordDrop
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     7/0         13   1.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     14.7        44       353    0.124       0.03        626    0.05            0            0
        L1     9/0         16   1.6      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000       0.00          0    0.00            0            0
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, run db_bench --fillseq --stats_per_interval --stats_interval and look at output
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D29787
      32a0a038