1. 03 9月, 2014 1 次提交
  2. 02 9月, 2014 1 次提交
    • I
      Don't let flush preempt compaction in certain cases · 7dcadb1d
      Igor Canadi 提交于
      Summary:
      I have an application configured with 16 background threads. Write rates are high. L0->L1 compactions is very slow and it limits the concurrency of the system. While it's happening, other 15 threads are idle. However, when there is a need of a flush, that one thread busy with L0->L1 is doing flush, instead of any other 15 threads that are just sitting there.
      
      This diff prevents that. If there are threads that are idle, we don't let flush preempt compaction.
      
      Test Plan: Will run stress test
      
      Reviewers: ljin, sdong, yhchiang
      
      Reviewed By: sdong, yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D22299
      7dcadb1d
  3. 01 9月, 2014 1 次提交
  4. 31 8月, 2014 1 次提交
  5. 30 8月, 2014 4 次提交
    • L
      limit max bytes that can be read/written per pread/write syscall · 7e9f28cb
      Lei Jin 提交于
      Summary:
      BlockBasedTable sst file size can grow to a large size when universal
      compaction is used. When index block exceeds 2G, pread seems to fail and
      return truncated data and causes "trucated block" error. I tried to use
      ```
        #define _FILE_OFFSET_BITS 64
      ```
      But the problem still persists. Splitting a big write/read into smaller
      batches seems to solve the problem.
      
      Test Plan:
      successfully compacted a case with resulting sst file at ~90G (2.1G
      index block size)
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22569
      7e9f28cb
    • R
      Improve Cuckoo Table Reader performance. Inlined hash function and number of... · d20b8cfa
      Radheshyam Balasundaram 提交于
      Improve Cuckoo Table Reader performance. Inlined hash function and number of buckets a power of two.
      
      Summary:
      Use inlined hash functions instead of function pointer. Make number of buckets a power of two and use bitwise and instead of mod.
      After these changes, we get almost 50% improvement in performance.
      
      Results:
      With 120000000 items, utilization is 89.41%, number of hash functions: 2.
      Time taken per op is 0.231us (4.3 Mqps) with batch size of 0
      Time taken per op is 0.229us (4.4 Mqps) with batch size of 0
      Time taken per op is 0.185us (5.4 Mqps) with batch size of 0
      With 120000000 items, utilization is 89.41%, number of hash functions: 2.
      Time taken per op is 0.108us (9.3 Mqps) with batch size of 10
      Time taken per op is 0.100us (10.0 Mqps) with batch size of 10
      Time taken per op is 0.103us (9.7 Mqps) with batch size of 10
      With 120000000 items, utilization is 89.41%, number of hash functions: 2.
      Time taken per op is 0.101us (9.9 Mqps) with batch size of 25
      Time taken per op is 0.098us (10.2 Mqps) with batch size of 25
      Time taken per op is 0.097us (10.3 Mqps) with batch size of 25
      With 120000000 items, utilization is 89.41%, number of hash functions: 2.
      Time taken per op is 0.100us (10.0 Mqps) with batch size of 50
      Time taken per op is 0.097us (10.3 Mqps) with batch size of 50
      Time taken per op is 0.097us (10.3 Mqps) with batch size of 50
      With 120000000 items, utilization is 89.41%, number of hash functions: 2.
      Time taken per op is 0.102us (9.8 Mqps) with batch size of 100
      Time taken per op is 0.098us (10.2 Mqps) with batch size of 100
      Time taken per op is 0.115us (8.7 Mqps) with batch size of 100
      
      With 100000000 items, utilization is 74.51%, number of hash functions: 2.
      Time taken per op is 0.201us (5.0 Mqps) with batch size of 0
      Time taken per op is 0.155us (6.5 Mqps) with batch size of 0
      Time taken per op is 0.152us (6.6 Mqps) with batch size of 0
      With 100000000 items, utilization is 74.51%, number of hash functions: 2.
      Time taken per op is 0.089us (11.3 Mqps) with batch size of 10
      Time taken per op is 0.084us (11.9 Mqps) with batch size of 10
      Time taken per op is 0.086us (11.6 Mqps) with batch size of 10
      With 100000000 items, utilization is 74.51%, number of hash functions: 2.
      Time taken per op is 0.087us (11.5 Mqps) with batch size of 25
      Time taken per op is 0.085us (11.7 Mqps) with batch size of 25
      Time taken per op is 0.093us (10.8 Mqps) with batch size of 25
      With 100000000 items, utilization is 74.51%, number of hash functions: 2.
      Time taken per op is 0.094us (10.6 Mqps) with batch size of 50
      Time taken per op is 0.094us (10.7 Mqps) with batch size of 50
      Time taken per op is 0.093us (10.8 Mqps) with batch size of 50
      With 100000000 items, utilization is 74.51%, number of hash functions: 2.
      Time taken per op is 0.092us (10.9 Mqps) with batch size of 100
      Time taken per op is 0.089us (11.2 Mqps) with batch size of 100
      Time taken per op is 0.088us (11.3 Mqps) with batch size of 100
      
      With 80000000 items, utilization is 59.60%, number of hash functions: 2.
      Time taken per op is 0.154us (6.5 Mqps) with batch size of 0
      Time taken per op is 0.168us (6.0 Mqps) with batch size of 0
      Time taken per op is 0.190us (5.3 Mqps) with batch size of 0
      With 80000000 items, utilization is 59.60%, number of hash functions: 2.
      Time taken per op is 0.081us (12.4 Mqps) with batch size of 10
      Time taken per op is 0.077us (13.0 Mqps) with batch size of 10
      Time taken per op is 0.083us (12.1 Mqps) with batch size of 10
      With 80000000 items, utilization is 59.60%, number of hash functions: 2.
      Time taken per op is 0.077us (13.0 Mqps) with batch size of 25
      Time taken per op is 0.073us (13.7 Mqps) with batch size of 25
      Time taken per op is 0.073us (13.7 Mqps) with batch size of 25
      With 80000000 items, utilization is 59.60%, number of hash functions: 2.
      Time taken per op is 0.076us (13.1 Mqps) with batch size of 50
      Time taken per op is 0.072us (13.8 Mqps) with batch size of 50
      Time taken per op is 0.072us (13.8 Mqps) with batch size of 50
      With 80000000 items, utilization is 59.60%, number of hash functions: 2.
      Time taken per op is 0.077us (13.0 Mqps) with batch size of 100
      Time taken per op is 0.074us (13.6 Mqps) with batch size of 100
      Time taken per op is 0.073us (13.6 Mqps) with batch size of 100
      
      With 70000000 items, utilization is 52.15%, number of hash functions: 2.
      Time taken per op is 0.190us (5.3 Mqps) with batch size of 0
      Time taken per op is 0.186us (5.4 Mqps) with batch size of 0
      Time taken per op is 0.184us (5.4 Mqps) with batch size of 0
      With 70000000 items, utilization is 52.15%, number of hash functions: 2.
      Time taken per op is 0.079us (12.7 Mqps) with batch size of 10
      Time taken per op is 0.070us (14.2 Mqps) with batch size of 10
      Time taken per op is 0.072us (14.0 Mqps) with batch size of 10
      With 70000000 items, utilization is 52.15%, number of hash functions: 2.
      Time taken per op is 0.080us (12.5 Mqps) with batch size of 25
      Time taken per op is 0.072us (14.0 Mqps) with batch size of 25
      Time taken per op is 0.071us (14.1 Mqps) with batch size of 25
      With 70000000 items, utilization is 52.15%, number of hash functions: 2.
      Time taken per op is 0.082us (12.1 Mqps) with batch size of 50
      Time taken per op is 0.071us (14.1 Mqps) with batch size of 50
      Time taken per op is 0.073us (13.6 Mqps) with batch size of 50
      With 70000000 items, utilization is 52.15%, number of hash functions: 2.
      Time taken per op is 0.080us (12.5 Mqps) with batch size of 100
      Time taken per op is 0.077us (13.0 Mqps) with batch size of 100
      Time taken per op is 0.078us (12.8 Mqps) with batch size of 100
      
      Test Plan:
      make check all
      make valgrind_check
      make asan_check
      
      Reviewers: sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22539
      d20b8cfa
    • T
      ForwardIterator: reset incomplete iterators on Seek() · 0f9c43ea
      Tomislav Novak 提交于
      Summary:
      When reading from kBlockCacheTier, ForwardIterator's internal child iterators
      may end up in the incomplete state (read was unable to complete without doing
      disk I/O). `ForwardIterator::status()` will correctly report that; however, the
      iterator may be stuck in that state until all sub-iterators are rebuilt:
      
        * `NeedToSeekImmutable()` may return false even if some sub-iterators are
          incomplete
        * one of the child iterators may be an empty iterator without any state other
          that the kIncomplete status (created using `NewErrorIterator()`); seeking on
          any such iterator has no effect -- we need to construct it again
      
      Akin to rebuilding iterators after a superversion bump, this diff makes forward
      iterator reset all incomplete child iterators when `Seek()` or `Next()` are
      called.
      
      Test Plan: TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test
      
      Reviewers: igor, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: lovro, march, leveldb
      
      Differential Revision: https://reviews.facebook.net/D22575
      0f9c43ea
    • L
      reduce recordTick overhead in compaction loop · 722d80c3
      Lei Jin 提交于
      Summary: It is too expensive to bump ticker to every key/vaue pair
      
      Test Plan: make release
      
      Reviewers: sdong, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22527
      722d80c3
  6. 29 8月, 2014 8 次提交
  7. 28 8月, 2014 4 次提交
  8. 27 8月, 2014 9 次提交
  9. 26 8月, 2014 5 次提交
  10. 24 8月, 2014 2 次提交
  11. 23 8月, 2014 1 次提交
    • I
      Fix concurrency issue in CompactionPicker · 42ea7952
      Igor Canadi 提交于
      Summary:
      I am currently working on a project that uses RocksDB. While debugging some perf issues, I came up across interesting compaction concurrency issue. Namely, I had 15 idle threads and a good comapction to do, but CompactionPicker returned "Compaction nothing to do". Here's how Internal stats looked:
      
          2014/08/22-08:08:04.551982 7fc7fc3f5700 ------- DUMPING STATS -------
          2014/08/22-08:08:04.552000 7fc7fc3f5700
          ** Compaction Stats [default] **
          Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)
          ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            L0     7/5        353   1.0      0.0     0.0      0.0       2.3      2.3    0.0   0.0      0.0      9.4        0         0         0         0        247        46    5.359       8.53          1 8526.25
            L1     2/2         86   1.3      2.6     1.9      0.7       2.6      1.9    2.7   1.3     24.3     24.0       39        19        71        52        109        11    9.938       0.00          0    0.00
            L2    26/0        833   1.3      5.7     1.7      4.0       5.2      1.2    6.3   3.0     15.6     14.2       47       112       147        35        373        44    8.468       0.00          0    0.00
            L3    12/0        505   0.1      0.0     0.0      0.0       0.0      0.0    0.0   0.0      0.0      0.0        0         0         0         0          0         0    0.000       0.00          0    0.00
           Sum    47/7       1778   0.0      8.3     3.6      4.6      10.0      5.4    8.1   4.4     11.6     14.1       86       131       218        87        728       101    7.212       8.53          1 8526.25
           Int     0/0          0   0.0      2.4     0.8      1.6       2.7      1.2   11.5   6.1     12.0     13.6       20        43        63        20        203        23    8.845       0.00          0    0.00
          Flush(GB): accumulative 2.266, interval 0.444
          Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 8.526 memtable_compaction, 0.000 leveln_slowdown_soft, 0.000 leveln_slowdown_hard
          Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 1 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard
      
          ** DB Stats **
          Uptime(secs): 336.8 total, 60.4 interval
          Cumulative writes: 61584000 writes, 6480589 batches, 9.5 writes per batch, 1.39 GB user ingest
          Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 GB written
          Interval writes: 11235257 writes, 1175050 batches, 9.6 writes per batch, 259.9 MB user ingest
          Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 MB written
      
      To see what happened, go here: https://github.com/facebook/rocksdb/blob/47b452cfcf9b1487d41f886a98bc0d6f95587e90/db/compaction_picker.cc#L430
      * The for loop started with level 1, because it has the worst score.
      * PickCompactionBySize on L429 returned nullptr because all files were being compacted
      * ExpandWhileOverlapping(c) returned true (because that's what it does when it gets nullptr!?)
      * for loop break-ed, never trying compactions for level 2 :( :(
      
      This bug was present at least since January. I have no idea how we didn't find this sooner.
      
      Test Plan:
      Unit testing compaction picker is hard. I tested this by running my service and observing L0->L1 and L2->L3 compactions in parallel. However, for long-term, I opened the task #4968469. @yhchiang is currently refactoring CompactionPicker, hopefully the new version will be unit-testable ;)
      
      Here's how my compactions look like after the patch:
      
          2014/08/22-08:50:02.166699 7f3400ffb700 ------- DUMPING STATS -------
          2014/08/22-08:50:02.166722 7f3400ffb700
          ** Compaction Stats [default] **
          Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)
          ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            L0     8/5        404   1.5      0.0     0.0      0.0       4.3      4.3    0.0   0.0      0.0      9.6        0         0         0         0        463        88    5.260       0.00          0    0.00
            L1     2/2         60   0.9      4.8     3.9      0.8       4.7      3.9    2.4   1.2     23.9     23.6       80        23       131       108        204        19   10.747       0.00          0    0.00
            L2    23/3        697   1.0     11.6     3.5      8.1      10.9      2.8    6.4   3.1     17.7     16.6       95       242       317        75        669        92    7.268       0.00          0    0.00
            L3    58/14      2207   0.3      6.2     1.6      4.6       5.9      1.3    7.4   3.6     14.6     13.9       43       121       159        38        436        36   12.106       0.00          0    0.00
           Sum    91/24      3368   0.0     22.5     9.1     13.5      25.8     12.4   11.2   6.0     13.0     14.9      218       386       607       221       1772       235    7.538       0.00          0    0.00
           Int     0/0          0   0.0      3.2     0.9      2.3       3.6      1.3   15.3   8.0     12.4     13.7       24        66        89        23        266        27    9.838       0.00          0    0.00
          Flush(GB): accumulative 4.336, interval 0.444
          Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 0.000 leveln_slowdown_soft, 0.000 leveln_slowdown_hard
          Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard
      
          ** DB Stats **
          Uptime(secs): 577.7 total, 60.1 interval
          Cumulative writes: 116960736 writes, 11966220 batches, 9.8 writes per batch, 2.64 GB user ingest
          Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 GB written
          Interval writes: 11643735 writes, 1206136 batches, 9.7 writes per batch, 269.2 MB user ingest
          Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 MB written
      
      Yay for concurrent L0->L1 and L2->L3 compactions!
      
      Reviewers: sdong, yhchiang, ljin
      
      Reviewed By: yhchiang
      
      Subscribers: yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D22305
      42ea7952
  12. 22 8月, 2014 2 次提交
  13. 21 8月, 2014 1 次提交
    • R
      Implement Prepare method in CuckooTableReader · 08be7f52
      Radheshyam Balasundaram 提交于
      Summary:
      - Implement Prepare method
      - Rewrite performance tests in cuckoo_table_reader_test to write new file only if one doesn't already exist.
      - Add performance tests for batch lookup along with prefetching.
      
      Test Plan:
      ./cuckoo_table_reader_test --enable_perf
      Results (We get better results if we used int64 comparator instead of string comparator (TBD in future diffs)):
      With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
      Time taken per op is 0.208us (4.8 Mqps) with batch size of 0
      With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
      Time taken per op is 0.182us (5.5 Mqps) with batch size of 10
      With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
      Time taken per op is 0.161us (6.2 Mqps) with batch size of 25
      With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
      Time taken per op is 0.161us (6.2 Mqps) with batch size of 50
      With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
      Time taken per op is 0.163us (6.1 Mqps) with batch size of 100
      
      With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
      Time taken per op is 0.252us (4.0 Mqps) with batch size of 0
      With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
      Time taken per op is 0.192us (5.2 Mqps) with batch size of 10
      With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
      Time taken per op is 0.195us (5.1 Mqps) with batch size of 25
      With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
      Time taken per op is 0.191us (5.2 Mqps) with batch size of 50
      With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
      Time taken per op is 0.194us (5.1 Mqps) with batch size of 100
      
      With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
      Time taken per op is 0.228us (4.4 Mqps) with batch size of 0
      With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
      Time taken per op is 0.185us (5.4 Mqps) with batch size of 10
      With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
      Time taken per op is 0.186us (5.4 Mqps) with batch size of 25
      With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
      Time taken per op is 0.189us (5.3 Mqps) with batch size of 50
      With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
      Time taken per op is 0.188us (5.3 Mqps) with batch size of 100
      
      With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
      Time taken per op is 0.325us (3.1 Mqps) with batch size of 0
      With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
      Time taken per op is 0.196us (5.1 Mqps) with batch size of 10
      With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
      Time taken per op is 0.199us (5.0 Mqps) with batch size of 25
      With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
      Time taken per op is 0.196us (5.1 Mqps) with batch size of 50
      With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
      Time taken per op is 0.209us (4.8 Mqps) with batch size of 100
      
      Reviewers: sdong, yhchiang, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22167
      08be7f52