1. 13 2月, 2014 1 次提交
  2. 09 2月, 2014 1 次提交
  3. 08 2月, 2014 1 次提交
    • I
      Readrandom with tailing iterator · 1560bb91
      Igor Canadi 提交于
      Summary:
      Added an option for readrandom benchmark to run with tailing iterator instead of Get. Benefit of tailing iterator is that it doesn't require locking DB mutex on access.
      
      I also have some results when running on my machine. The results highly depend on number of cache shards. With our current benchmark setting of 4 table cache shards and 6 block cache shards, I don't see much improvements of using tailing iterator. In that case, we're probably seeing cache mutex contention.
      
      Here are the results for different number of shards
      
          cache shards       tailing iterator        get
             6                      1.38M           1.16M
            10                      1.58M           1.15M
      
      As soon as we get rid of cache mutex contention, we're seeing big improvements in using tailing iterator vs. ordinary get.
      
      Test Plan: ran regression test
      
      Reviewers: dhruba, haobo, ljin, kailiu, sding
      
      Reviewed By: haobo
      
      CC: tnovak
      
      Differential Revision: https://reviews.facebook.net/D15867
      1560bb91
  4. 04 2月, 2014 2 次提交
  5. 25 1月, 2014 3 次提交
    • S
      Moving Some includes from options.h to forward declaration · 8477255d
      Siying Dong 提交于
      Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.
      
      Test Plan: make all check
      
      Reviewers: kailiu, haobo, igor, dhruba
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15411
      8477255d
    • I
      Revert "Moving to glibc-fb" · e832e72b
      Igor Canadi 提交于
      This reverts commit d24961b6.
      
      For some reason, glibc2.17-fb breaks gflags. Reverting for now
      e832e72b
    • I
      Moving to glibc-fb · d24961b6
      Igor Canadi 提交于
      Summary:
      It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.
      
      This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.
      
      Test Plan:
      Compiled
      Ran ./db_bench ./db_stress ./db_repl_stress
      
      Reviewers: kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15405
      d24961b6
  6. 18 1月, 2014 1 次提交
  7. 11 1月, 2014 1 次提交
    • S
      Improve RocksDB "get" performance by computing merge result in memtable · a09ee106
      Schalk-Willem Kruger 提交于
      Summary:
      Added an option (max_successive_merges) that can be used to specify the
      maximum number of successive merge operations on a key in the memtable.
      This can be used to improve performance of the "get" operation. If many
      successive merge operations are performed on a key, the performance of "get"
      operations on the key deteriorates, as the value has to be computed for each
      "get" operation by applying all the successive merge operations.
      
      FB Task ID: #3428853
      
      Test Plan:
      make all check
      db_bench --benchmarks=readrandommergerandom
      counter_stress_test
      
      Reviewers: haobo, vamsi, dhruba, sdong
      
      Reviewed By: haobo
      
      CC: zshao
      
      Differential Revision: https://reviews.facebook.net/D14991
      a09ee106
  8. 19 12月, 2013 1 次提交
    • M
      Add 'readtocache' test · ca92068b
      Mark Callaghan 提交于
      Summary:
      For some tests I want to cache the database prior to running other tests on the same invocation
      of db_bench. The readtocache test ignores --threads and --reads so those can be used by other tests
      and it will still do a full read of --num rows with one thread. It might be invoked like:
        db_bench --benchmarks=readtocache,readrandom --reads 100 --num 10000 --threads 8
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14739
      ca92068b
  9. 13 12月, 2013 1 次提交
    • M
      Add monitoring for universal compaction and add counters for compaction IO · e9e6b00d
      Mark Callaghan 提交于
      Summary:
      Adds these counters
      { WAL_FILE_SYNCED, "rocksdb.wal.synced" }
        number of writes that request a WAL sync
      { WAL_FILE_BYTES, "rocksdb.wal.bytes" },
        number of bytes written to the WAL
      { WRITE_DONE_BY_SELF, "rocksdb.write.self" },
        number of writes processed by the calling thread
      { WRITE_DONE_BY_OTHER, "rocksdb.write.other" },
        number of writes not processed by the calling thread. Instead these were
        processed by the current holder of the write lock
      { WRITE_WITH_WAL, "rocksdb.write.wal" },
        number of writes that request WAL logging
      { COMPACT_READ_BYTES, "rocksdb.compact.read.bytes" },
        number of bytes read during compaction
      { COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes" },
        number of bytes written during compaction
      
      Per-interval stats output was updated with WAL stats and correct stats for universal compaction
      including a correct value for write-amplification. It now looks like:
                                     Compactions
      Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall Stall-cnt
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0        7      464  46.4       281      3411      3875      3411         0      3875        2.1      12.1        13.8      621        0      240      240      628       0.0         0
      Uptime(secs): 310.8 total, 2.0 interval
      Writes cumulative: 9999999 total, 9999999 batches, 1.0 per batch, 1.22 ingest GB
      WAL cumulative: 9999999 WAL writes, 9999999 WAL syncs, 1.00 writes per sync, 1.22 GB written
      Compaction IO cumulative (GB): 1.22 new, 3.33 read, 3.78 write, 7.12 read+write
      Compaction IO cumulative (MB/sec): 4.0 new, 11.0 read, 12.5 write, 23.4 read+write
      Amplification cumulative: 4.1 write, 6.8 compaction
      Writes interval: 100000 total, 100000 batches, 1.0 per batch, 12.5 ingest MB
      WAL interval: 100000 WAL writes, 100000 WAL syncs, 1.00 writes per sync, 0.01 MB written
      Compaction IO interval (MB): 12.49 new, 14.98 read, 21.50 write, 36.48 read+write
      Compaction IO interval (MB/sec): 6.4 new, 7.6 read, 11.0 write, 18.6 read+write
      Amplification interval: 101.7 write, 102.9 compaction
      Stalls(secs): 142.924 level0_slowdown, 0.000 level0_numfiles, 0.805 memtable_compaction, 0.000 leveln_slowdown
      Stalls(count): 132461 level0_slowdown, 0 level0_numfiles, 3 memtable_compaction, 0 leveln_slowdown
      
      Task ID: #3329644, #3301695
      
      Blame Rev:
      
      Test Plan:
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14583
      e9e6b00d
  10. 04 12月, 2013 2 次提交
    • M
      Add compression options to db_bench · 97aa401e
      Mark Callaghan 提交于
      Summary:
      This adds 2 options for compression to db_bench:
      * universal_compression_size_percent
      * compression_level - to set zlib compression level
      It also logs compression_size_percent at startup in LOG
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14439
      97aa401e
    • I
      Killing Transform Rep · eb12e47e
      Igor Canadi 提交于
      Summary:
      Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around.
      
      This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release.
      
      I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14397
      eb12e47e
  11. 22 11月, 2013 3 次提交
  12. 17 11月, 2013 1 次提交
  13. 13 11月, 2013 1 次提交
  14. 07 11月, 2013 1 次提交
    • S
      WAL log retention policy based on archive size. · c2be2cba
      shamdor 提交于
      Summary:
      Archive cleaning will still happen every WAL_ttl seconds
      but archived logs will be deleted only if archive size
      is greater then a WAL_size_limit value.
      Empty archived logs will be deleted evety WAL_ttl.
      
      Test Plan:
      1. Unit tests pass.
      2. Benchmark.
      
      Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13869
      c2be2cba
  15. 02 11月, 2013 1 次提交
    • D
      Implement a compressed block cache. · b4ad5e89
      Dhruba Borthakur 提交于
      Summary:
      Rocksdb can now support a uncompressed block cache, or a compressed
      block cache or both. Lookups first look for a block in the
      uncompressed cache, if it is not found only then it is looked up
      in the compressed cache. If it is found in the compressed cache,
      then it is uncompressed and inserted into the uncompressed cache.
      
      It is possible that the same block resides in the compressed cache
      as well as the uncompressed cache at the same time. Both caches
      have their own individual LRU policy.
      
      Test Plan: Unit test case attached.
      
      Reviewers: kailiu, sdong, haobo, leveldb
      
      Reviewed By: haobo
      
      CC: xjin, haobo
      
      Differential Revision: https://reviews.facebook.net/D12675
      b4ad5e89
  16. 24 10月, 2013 1 次提交
  17. 17 10月, 2013 1 次提交
  18. 12 10月, 2013 1 次提交
    • S
      LRUCache to try to clean entries not referenced first. · f8509653
      sdong 提交于
      Summary:
      With this patch, when LRUCache.Insert() is called and the cache is full, it will first try to free up entries whose reference counter is 1 (would become 0 after remo\
      ving from the cache). We do it in two passes, in the first pass, we only try to release those unreferenced entries. If we cannot free enough space after traversing t\
      he first remove_scan_cnt_ entries, we start from the beginning again and remove those entries being used.
      
      Test Plan: add two unit tests to cover the codes
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb, emayanke, xjin
      
      Differential Revision: https://reviews.facebook.net/D13377
      f8509653
  19. 06 10月, 2013 1 次提交
  20. 05 10月, 2013 1 次提交
  21. 20 9月, 2013 1 次提交
    • D
      Better locking in vectorrep that increases throughput to match speed of storage. · 5e9f3a9a
      Dhruba Borthakur 提交于
      Summary:
      There is a use-case where we want to insert data into rocksdb as
      fast as possible. Vector rep is used for this purpose.
      
      The background flush thread needs to flush the vectorrep to
      storage. It acquires the dblock then sorts the vector, releases
      the dblock and then writes the sorted vector to storage. This is
      suboptimal because the lock is held during the sort, which
      prevents new writes for occuring.
      
      This patch moves the sorting of the vector rep to outside the
      db mutex. Performance is now as fastas the underlying storage
      system. If you are doing buffered writes to rocksdb files, then
      you can observe throughput upwards of 200 MB/sec writes.
      
      This is an early draft and not yet ready to be reviewed.
      
      Test Plan:
      make check
      
      Task ID: #
      
      Blame Rev:
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D12987
      5e9f3a9a
  22. 16 9月, 2013 2 次提交
  23. 14 9月, 2013 1 次提交
    • D
      Added a parameter to limit the maximum space amplification for universal compaction. · 4012ca1c
      Dhruba Borthakur 提交于
      Summary:
      Added a new field called max_size_amplification_ratio in the
      CompactionOptionsUniversal structure. This determines the maximum
      percentage overhead of space amplification.
      
      The size amplification is defined to be the ratio between the size of
      the oldest file to the sum of the sizes of all other files. If the
      size amplification exceeds the specified value, then min_merge_width
      and max_merge_width are ignored and a full compaction of all files is done.
      A value of 10 means that the size a database that stores 100 bytes
      of user data could occupy 110 bytes of physical storage.
      
      Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.
      
      Reviewers: haobo, emayanke, xjin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12825
      4012ca1c
  24. 13 9月, 2013 1 次提交
    • H
      [RocksDB] Remove Log file immediately after memtable flush · 0e422308
      Haobo Xu 提交于
      Summary: As title. The DB log file life cycle is tied up with the memtable it backs. Once the memtable is flushed to sst and committed, we should be able to delete the log file, without holding the mutex. This is part of the bigger change to avoid FindObsoleteFiles at runtime. It deals with log files. sst files will be dealt with later.
      
      Test Plan: make check; db_bench
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11709
      0e422308
  25. 24 8月, 2013 2 次提交
  26. 23 8月, 2013 4 次提交
    • J
      Add three new MemTableRep's · 74781a0c
      Jim Paton 提交于
      Summary:
      This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
      
      UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
      
      VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
      
      PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
      
      I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
      
      Test Plan:
      make -j32 check
      ./db_stress --memtablerep=vector
      ./db_stress --memtablerep=unsorted
      ./db_stress --memtablerep=prefixhash --prefix_size=10
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12117
      74781a0c
    • X
      Pull from https://reviews.facebook.net/D10917 · 17dc1280
      Xing Jin 提交于
      Summary: Pull Mark's patch and slightly revise it. I revised another place in db_impl.cc with similar new formula.
      
      Test Plan:
      make all check. Also run "time ./db_bench --num=2500000000 --numdistinct=2200000000". It has run for 20+ hours and hasn't finished. Looks good so far:
      
      Installed stack trace handler for SIGILL SIGSEGV SIGBUS SIGABRT
      LevelDB:    version 2.0
      Date:       Tue Aug 20 23:11:55 2013
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    2500000000
      RawSize:    276565.6 MB (estimated)
      FileSize:   157356.3 MB (estimated)
      Write rate limit: 0
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillseq      :    7202.000 micros/op 138 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillsync     :    7148.000 micros/op 139 ops/sec; (2500000 ops)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillrandom   :    7105.000 micros/op 140 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      overwrite    :    6930.000 micros/op 144 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980507 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.021 micros/op 979620 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     113.000 micros/op 8849 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :     102.000 micros/op 9803 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      Created bg thread 0x7f0ac17f7700
      compact      :  111701.000 micros/op 8 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980376 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     120.000 micros/op 8333 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :      29.000 micros/op 34482 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      ... finished 618100000 ops
      
      Reviewers: MarkCallaghan, haobo, dhruba, chip
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D12441
      17dc1280
    • T
      Revert "Prefix scan: db_bench and bug fixes" · 94cf2187
      Tyler Harter 提交于
      This reverts commit c2bd8f48.
      94cf2187
    • T
      Prefix scan: db_bench and bug fixes · c2bd8f48
      Tyler Harter 提交于
      Summary: If use_prefix_filters is set and read_range>1, then the random seeks will set a the prefix filter to be the prefix of the key which was randomly selected as the target.  Still need to add statistics (perhaps in a separate diff).
      
      Test Plan: ./db_bench --benchmarks=fillseq,prefixscanrandom --num=10000000 --statistics=1 --use_prefix_blooms=1 --use_prefix_api=1 --bloom_bits=10
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D12273
      c2bd8f48
  27. 21 8月, 2013 1 次提交
  28. 16 8月, 2013 2 次提交
    • D
      Tiny fix to db_bench for make release. · d1d3d15e
      Deon Nicholas 提交于
      Summary:
      In release, "found variable assigned but not used anywhere". Changed it to work with
      assert. Someone accept this :).
      
      Test Plan: make release -j 32
      
      Reviewers: haobo, dhruba, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12309
      d1d3d15e
    • D
      Benchmarking for Merge Operator · ad48c3c2
      Deon Nicholas 提交于
      Summary:
      Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
      of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
      a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
      so that the tester can easily benchmark different merge operators. Currently supports
      the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.
      
      Test Plan:
      	1. make db_bench
      	2. Test the PutOperator (simulating Put) as follows:
      ./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
      --threads=2
      
      3. Test the UInt64AddOperator (simulating numeric addition) similarly:
      ./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
      --merge_operator=uint64add --threads=2
      
      Reviewers: haobo, dhruba, zshao, MarkCallaghan
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11535
      ad48c3c2