1. 25 8月, 2015 1 次提交
  2. 21 8月, 2015 2 次提交
    • S
      Add a counter about estimated pending compaction bytes · 07d2d341
      sdong 提交于
      Summary:
      Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
      In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.
      
      Test Plan: Add unit tests
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44205
      07d2d341
    • I
      Total SST files size DB Property · 027ca5b2
      Islam AbdelRahman 提交于
      Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions
      
      Test Plan: Unittests for the new property
      
      Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44799
      027ca5b2
  3. 20 8月, 2015 1 次提交
    • Y
      Introduce GetIntProperty("rocksdb.size-all-mem-tables") · df79eafc
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, GetIntProperty("rocksdb.cur-size-all-mem-tables") only returns
      the memory usage by those memtables which have not yet been flushed.
      
      This patch introduces GetIntProperty("rocksdb.size-all-mem-tables"),
      which includes the memory usage by all the memtables, includes those
      have been flushed but pinned by iterators.
      
      Test Plan: Added a test in db_test
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D44229
      df79eafc
  4. 15 8月, 2015 1 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
  5. 12 8月, 2015 1 次提交
    • A
      Removing duplicate code in db_bench/db_stress, fixing typos · 4249f159
      Andres Notzli 提交于
      Summary:
      While working on single delete support for db_bench, I realized that
      db_bench/db_stress contain a bunch of duplicate code related to
      copmression and found some typos. This patch removes duplicate code,
      typos and a redundant #ifndef in internal_stats.cc.
      
      Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress
      
      Reviewers: yhchiang, sdong, rven, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43965
      4249f159
  6. 22 7月, 2015 1 次提交
    • A
      Report live data size estimate · 06aebca5
      Andres Notzli 提交于
      Summary:
      Fixes T6548822. Added a new function for estimating the size of the live data
      as proposed in the task. The value can be accessed through the property
      rocksdb.estimate-live-data-size.
      
      Test Plan:
      There are two unit tests in version_set_test and a simple test in db_test.
      make version_set_test && ./version_set_test;
      make db_test && ./db_test gtest_filter=GetProperty
      
      Reviewers: rven, igor, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41493
      06aebca5
  7. 18 6月, 2015 1 次提交
    • Y
      Fixed a bug of CompactionStats in multi-level universal compaction case · bb1c74ce
      Yueh-Hsuan Chiang 提交于
      Summary:
      Universal compaction can involves in multiple levels.  However,
      the current implementation of bytes_readn and bytes_readnp1
      (and some other stats with postfix `n` and `np1`) assumes compaction
      can only have two levels.
      
      This patch fixes this bug and redefines bytes_readn and bytes_readnp1:
      * bytes_readnp1: the number of bytes read in the compaction output level.
      * bytes_readn: the total number of bytes read minus bytes_readnp1
      
      Test Plan: Add a test in compaction_job_stats_test
      
      Reviewers: igor, sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40239
      bb1c74ce
  8. 29 5月, 2015 1 次提交
    • A
      Support saving history in memtable_list · c8153510
      agiardullo 提交于
      Summary:
      For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
      
      After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
      
      This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
      
      However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
      
      Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.
      
      Reviewers: sdong, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D37443
      c8153510
  9. 23 5月, 2015 1 次提交
  10. 20 5月, 2015 1 次提交
    • M
      Add --wal_bytes_per_sync for db_bench and more IO stats · 944043d6
      Mark Callaghan 提交于
      Summary:
      See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats.
      I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates
      can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed
      using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job
      (IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes.
      The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the
      Interval compaction line uses the interval uptime. So these Cumalative & Interval
      compaction IO rates cannot be compared to the per-level IO rates. But both forms of
      the rates are useful for debugging perf.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D38667
      944043d6
  11. 02 4月, 2015 1 次提交
  12. 31 3月, 2015 1 次提交
    • M
      Make the benchmark scripts configurable and add tests · 99ec2412
      Mark Callaghan 提交于
      Summary:
      This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
      ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
      makes the duration per-test configurable and refactors the test scripts.
      
      Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
      the writer thread does Merge rather than Put.
      
      Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
      it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
      Sometimes automation and humans want different format.
      
      Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
      stats in the output at test end.
      
      Adds the average ingest rate to compaction IO stats. Output now looks like:
      https://gist.github.com/mdcallag/2bd64d18be1b93adc494
      
      More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1
      
      For benchmark.sh changes default RocksDB configuration to reduce stalls:
      * min_level_to_compress from 2 to 3
      * hard_rate_limit from 2 to 3
      * max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
      * L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)
      
      Task ID: #6596829
      
      Blame Rev:
      
      Test Plan:
      run tools/run_flash_bench.sh
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D36075
      99ec2412
  13. 28 3月, 2015 1 次提交
  14. 19 3月, 2015 1 次提交
  15. 17 3月, 2015 1 次提交
  16. 15 3月, 2015 2 次提交
    • I
      Make RecordIn/RecordOut human readable · c6967a1a
      Igor Canadi 提交于
      Summary: I had hard time understanding these big numbers. Here's how the output looks like now: https://gist.github.com/igorcanadi/4c39c17685049584a992
      
      Test Plan: db_bench
      
      Reviewers: sdong, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D35073
      c6967a1a
    • M
      Stop printing per-level stall times. · c8da6703
      Mark Callaghan 提交于
      Summary:
      Per-level stall times are the suggested stall time, not the actual stall time so this change stops printing them
      both in the per-level output lines and in the summary. Also changed output for total stall time to include units
      in all cases. The new output looks like:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)    RecordIn   RecordDrop
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     4/1          7   0.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     12.9        50       352    0.141        882            0            0
        L1     5/0          9   0.9      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
        L2    54/0         99   1.0      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
        L3   289/0        527   0.5      0.0     0.0      0.0       0.0      0.0       0.5   0.0      0.0      0.0         0         0    0.000          0            0            0
       Sum   352/1        642   0.0      0.0     0.0      0.0       0.6      0.6       1.7   1.0      0.0     12.9        50       352    0.141        882            0            0
       Int     0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     15.5         0         3    0.118          7            0            0
      Flush(GB): accumulative 0.627, interval 0.005
      Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 882 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard
      
      Task ID: #6493861
      
      Blame Rev:
      
      Test Plan:
      run db_bench, look at output
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D35085
      c8da6703
  17. 13 3月, 2015 1 次提交
  18. 04 3月, 2015 1 次提交
    • Y
      Fix a bug in stall time counter. Improve its output format. · 694988b6
      Yueh-Hsuan Chiang 提交于
      Summary: Fix a bug in stall time counter.  Improve its output format.
      
      Test Plan:
      export ROCKSDB_TESTS=Timeout
      ./db_test
      
      ./db_bench --benchmarks=fillrandom --stats_interval=10000 --statistics=true --stats_per_interval=1 --num=1000000 --threads=4 --level0_stop_writes_trigger=3 --level0_slowdown_writes_trigger=2
      
      sample output:
          Uptime(secs): 35.8 total, 0.0 interval
          Cumulative writes: 359590 writes, 359589 keys, 183047 batches, 2.0 writes per batch, 0.04 GB user ingest, stall seconds: 1786.008 ms
          Cumulative WAL: 359591 writes, 183046 syncs, 1.96 writes per sync, 0.04 GB written
          Interval writes: 253 writes, 253 keys, 128 batches, 2.0 writes per batch, 0.0 MB user ingest, stall time: 0 us
          Interval WAL: 253 writes, 128 syncs, 1.96 writes per sync, 0.00 MB written
      
      Reviewers: MarkCallaghan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D34275
      694988b6
  19. 03 3月, 2015 1 次提交
    • I
      options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size... · db037393
      Igor Canadi 提交于
      options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
      
      Summary:
      When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.
      
      In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.
      
      Test Plan: New unit tests and pass tests suites including valgrind.
      
      Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo
      
      Reviewed By: ikabiljo
      
      Subscribers: yoshinorim, ikabiljo, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D31437
      db037393
  20. 20 2月, 2015 1 次提交
  21. 10 1月, 2015 1 次提交
    • S
      DB Stats Dump to print total stall time · 9132e52e
      sdong 提交于
      Summary:
      Add printing of stall time in DB Stats:
      
      Sample outputs:
      
      ** DB Stats **
      Uptime(secs): 53.2 total, 1.7 interval
      Cumulative writes: 625940 writes, 625939 keys, 625940 batches, 1.0 writes per batch, 0.49 GB user ingest, stall micros: 50691070
      Cumulative WAL: 625940 writes, 625939 syncs, 1.00 writes per sync, 0.49 GB written
      Interval writes: 10859 writes, 10859 keys, 10859 batches, 1.0 writes per batch, 8.7 MB user ingest, stall micros: 1692319
      Interval WAL: 10859 writes, 10859 syncs, 1.00 writes per sync, 0.01 MB written
      
      Test Plan:
      make all check
      verify printing using db_bench
      
      Reviewers: igor, yhchiang, rven, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D31239
      9132e52e
  22. 06 12月, 2014 1 次提交
  23. 04 12月, 2014 1 次提交
    • M
      Add Moved(GB) to Compaction IO stats · 32a0a038
      Mark Callaghan 提交于
      Summary:
      Adds counter for bytes moved (files pushed down a level rather than compacted) to compaction
      IO stats as Moved(GB). From the output removed these infrequently used columns: RW-Amp, Rn(cnt), Rnp1(cnt),
      Wnp1(cnt), Wnew(cnt).
      Example old output:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) RecordIn RecordDrop
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     0/0          0   0.0      0.0     0.0      0.0    2130.8   2130.8    0.0   0.0      0.0    109.1        0         0         0         0      20002     25068    0.798      28.75     182059    0.16       0          0
        L1   142/0        509   1.0   4618.5  2036.5   2582.0    4602.1   2020.2    4.5   2.3     88.5     88.1    24220    701246   1215528    514282      53466      4229   12.643       0.00          0    0.002032745988  300688729
      
      Example new output:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)     RecordIn   RecordDrop
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     7/0         13   1.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     14.7        44       353    0.124       0.03        626    0.05            0            0
        L1     9/0         16   1.6      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000       0.00          0    0.00            0            0
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, run db_bench --fillseq --stats_per_interval --stats_interval and look at output
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D29787
      32a0a038
  24. 26 11月, 2014 1 次提交
  25. 25 11月, 2014 1 次提交
  26. 19 11月, 2014 1 次提交
  27. 14 11月, 2014 2 次提交
    • Y
      Fix SIGSEGV · 4161de92
      Yueh-Hsuan Chiang 提交于
      Summary: As a short-term fix, let's go back to previous way of calculating NeedsCompaction(). SIGSEGV happens because NeedsCompaction() can happen before super_version (and thus MutableCFOptions) is initialized.
      
      Test Plan: make check
      
      Reviewers: ljin, sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28875
      4161de92
    • Y
      Move NeedsCompaction() from VersionStorageInfo to CompactionPicker · 1d1a64f5
      Yueh-Hsuan Chiang 提交于
      Summary:
      Move NeedsCompaction() from VersionStorageInfo to CompactionPicker
      to allow different compaction strategy to have their own way to
      determine whether doing compaction is necessary.
      
      When compaction style is set to kCompactionStyleNone, then
      NeedsCompaction() will always return false.
      
      Test Plan:
      export ROCKSDB_TESTS=Compact
      ./db_test
      
      Reviewers: ljin, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28719
      1d1a64f5
  28. 06 11月, 2014 1 次提交
    • S
      Fix RecordIn and RecordDrop stats · 2ea1219e
      sdong 提交于
      Summary:
      1. fix possible overflow of the two stats by using uint64_t
      2. use a similar source of data to calculate RecordDrop. Previous one is not correct.
      
      Test Plan: See outputs of db_bench settings, and the results look reasonable
      
      Reviewers: MarkCallaghan, ljin, igor
      
      Reviewed By: igor
      
      Subscribers: rven, leveldb, yhchiang, dhruba
      
      Differential Revision: https://reviews.facebook.net/D28155
      2ea1219e
  29. 05 11月, 2014 1 次提交
  30. 01 11月, 2014 3 次提交
  31. 30 10月, 2014 1 次提交
    • S
      Make CompactionPicker more easily tested · 76d1c28e
      sdong 提交于
      Summary:
      Make compaction picker easier to test.
      The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.
      
      It now passes most tests. Still two things need to be done:
      (1) deal with the FIFO compaction's file size.
      (2) write an example test to make sure the interface can do the job.
      
      Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.
      
      Test Plan:
      Pass all unit tests and compaction_picker_test
      
      Reviewers: yhchiang, rven, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D27639
      76d1c28e
  32. 23 10月, 2014 1 次提交
    • S
      Printing number of keys in DB Stats · d755e53b
      sdong 提交于
      Summary: It is useful to print out number of keys in DB Stats
      
      Test Plan:
      ./db_bench --benchmarks fillrandom --num 1000000 -threads 16 -batch_size=16
      
      and watch the outputs in LOG files
      
      Reviewers: MarkCallaghan, ljin, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D24513
      d755e53b
  33. 18 10月, 2014 1 次提交
  34. 03 10月, 2014 1 次提交
  35. 05 9月, 2014 1 次提交