1. 24 8月, 2013 2 次提交
  2. 23 8月, 2013 4 次提交
    • J
      Add three new MemTableRep's · 74781a0c
      Jim Paton 提交于
      Summary:
      This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
      
      UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
      
      VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
      
      PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
      
      I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
      
      Test Plan:
      make -j32 check
      ./db_stress --memtablerep=vector
      ./db_stress --memtablerep=unsorted
      ./db_stress --memtablerep=prefixhash --prefix_size=10
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12117
      74781a0c
    • X
      Pull from https://reviews.facebook.net/D10917 · 17dc1280
      Xing Jin 提交于
      Summary: Pull Mark's patch and slightly revise it. I revised another place in db_impl.cc with similar new formula.
      
      Test Plan:
      make all check. Also run "time ./db_bench --num=2500000000 --numdistinct=2200000000". It has run for 20+ hours and hasn't finished. Looks good so far:
      
      Installed stack trace handler for SIGILL SIGSEGV SIGBUS SIGABRT
      LevelDB:    version 2.0
      Date:       Tue Aug 20 23:11:55 2013
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    2500000000
      RawSize:    276565.6 MB (estimated)
      FileSize:   157356.3 MB (estimated)
      Write rate limit: 0
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillseq      :    7202.000 micros/op 138 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillsync     :    7148.000 micros/op 139 ops/sec; (2500000 ops)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillrandom   :    7105.000 micros/op 140 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      overwrite    :    6930.000 micros/op 144 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980507 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.021 micros/op 979620 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     113.000 micros/op 8849 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :     102.000 micros/op 9803 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      Created bg thread 0x7f0ac17f7700
      compact      :  111701.000 micros/op 8 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980376 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     120.000 micros/op 8333 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :      29.000 micros/op 34482 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      ... finished 618100000 ops
      
      Reviewers: MarkCallaghan, haobo, dhruba, chip
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D12441
      17dc1280
    • T
      Revert "Prefix scan: db_bench and bug fixes" · 94cf2187
      Tyler Harter 提交于
      This reverts commit c2bd8f48.
      94cf2187
    • T
      Prefix scan: db_bench and bug fixes · c2bd8f48
      Tyler Harter 提交于
      Summary: If use_prefix_filters is set and read_range>1, then the random seeks will set a the prefix filter to be the prefix of the key which was randomly selected as the target.  Still need to add statistics (perhaps in a separate diff).
      
      Test Plan: ./db_bench --benchmarks=fillseq,prefixscanrandom --num=10000000 --statistics=1 --use_prefix_blooms=1 --use_prefix_api=1 --bloom_bits=10
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D12273
      c2bd8f48
  3. 21 8月, 2013 1 次提交
  4. 16 8月, 2013 2 次提交
    • D
      Tiny fix to db_bench for make release. · d1d3d15e
      Deon Nicholas 提交于
      Summary:
      In release, "found variable assigned but not used anywhere". Changed it to work with
      assert. Someone accept this :).
      
      Test Plan: make release -j 32
      
      Reviewers: haobo, dhruba, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12309
      d1d3d15e
    • D
      Benchmarking for Merge Operator · ad48c3c2
      Deon Nicholas 提交于
      Summary:
      Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
      of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
      a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
      so that the tester can easily benchmark different merge operators. Currently supports
      the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.
      
      Test Plan:
      	1. make db_bench
      	2. Test the PutOperator (simulating Put) as follows:
      ./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
      --threads=2
      
      3. Test the UInt64AddOperator (simulating numeric addition) similarly:
      ./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
      --merge_operator=uint64add --threads=2
      
      Reviewers: haobo, dhruba, zshao, MarkCallaghan
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11535
      ad48c3c2
  5. 15 8月, 2013 1 次提交
    • X
      Minor fix to current codes · 0a5afd1a
      Xing Jin 提交于
      Summary:
      Minor fix to current codes, including: coding style, output format,
      comments. No major logic change. There are only 2 real changes, please see my inline comments.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12297
      0a5afd1a
  6. 06 8月, 2013 1 次提交
    • J
      Add soft and hard rate limit support · 1036537c
      Jim Paton 提交于
      Summary:
      This diff adds support for both soft and hard rate limiting. The following changes are included:
      
      1) Options.rate_limit is renamed to Options.hard_rate_limit.
      2) Options.rate_limit_delay_milliseconds is renamed to Options.rate_limit_delay_max_milliseconds.
      3) Options.soft_rate_limit is added.
      4) If the maximum compaction score is > hard_rate_limit and rate_limit_delay_max_milliseconds == 0, then writes are delayed by 1 ms at a time until the max compaction score falls below hard_rate_limit.
      5) If the max compaction score is > soft_rate_limit but <= hard_rate_limit, then writes are delayed by 0-1 ms depending on how close we are to hard_rate_limit.
      6) Users can disable 4 by setting hard_rate_limit = 0. They can add a limit to the maximum amount of time waited by setting rate_limit_delay_max_milliseconds > 0. Thus, the old behavior can be preserved by setting soft_rate_limit = 0, which is the default.
      
      Test Plan:
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12003
      1036537c
  7. 24 7月, 2013 1 次提交
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  8. 12 7月, 2013 1 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919
  9. 10 7月, 2013 1 次提交
  10. 04 7月, 2013 1 次提交
  11. 01 7月, 2013 2 次提交
    • D
      Reduce write amplification by merging files in L0 back into L0 · 47c4191f
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      47c4191f
    • D
      Reduce write amplification by merging files in L0 back into L0 · 554c06dd
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      554c06dd
  12. 19 6月, 2013 4 次提交
  13. 15 6月, 2013 2 次提交
  14. 13 6月, 2013 2 次提交
    • D
      [Rocksdb] [Multiget] Introduced multiget into db_bench · 4985a9f7
      Deon Nicholas 提交于
      Summary:
      Preliminary! Introduced the --use_multiget=1 and --keys_per_multiget=n
      flags for db_bench. Also updated and tested the ReadRandom() method
      to include an option to use multiget. By default,
      keys_per_multiget=100.
      
      Preliminary tests imply that multiget is at least 1.25x faster per
      key than regular get.
      
      Will continue adding Multiget for ReadMissing, ReadHot,
      RandomWithVerify, ReadRandomWriteRandom; soon. Will also think
      about ways to better verify benchmarks.
      
      Test Plan:
      1. make db_bench
      2. ./db_bench --benchmarks=fillrandom
      3. ./db_bench --benchmarks=readrandom --use_existing_db=1
      	      --use_multiget=1 --threads=4 --keys_per_multiget=100
      4. ./db_bench --benchmarks=readrandom --use_existing_db=1
      	      --threads=4
      5. Verify ops/sec (and 1000000 of 1000000 keys found)
      
      Reviewers: haobo, MarkCallaghan, dhruba
      
      Reviewed By: MarkCallaghan
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11127
      4985a9f7
    • H
      [RocksDB] cleanup EnvOptions · bdf10859
      Haobo Xu 提交于
      Summary:
      This diff simplifies EnvOptions by treating it as POD, similar to Options.
      - virtual functions are removed and member fields are accessed directly.
      - StorageOptions is removed.
      - Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
      - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11175
      bdf10859
  15. 08 6月, 2013 1 次提交
  16. 06 6月, 2013 1 次提交
  17. 02 6月, 2013 1 次提交
    • H
      [RocksDB] Introduce Fast Mutex option · d897d33b
      Haobo Xu 提交于
      Summary:
      This diff adds an option to specify whether PTHREAD_MUTEX_ADAPTIVE_NP will be enabled for the rocksdb single big kernel lock. db_bench also have this option now.
      Quickly tested 8 thread cpu bound 100 byte random read.
      No fast mutex: ~750k/s ops
      With fast mutex: ~880k/s ops
      
      Test Plan: make check; db_bench; db_stress
      
      Reviewers: dhruba
      
      CC: MarkCallaghan, leveldb
      
      Differential Revision: https://reviews.facebook.net/D11031
      d897d33b
  18. 31 5月, 2013 1 次提交
    • H
      [RocksDB] [Performance] Allow different posix advice to be applied to the same table file · ab8d2f6a
      Haobo Xu 提交于
      Summary:
      Current posix advice implementation ties up the access pattern hint with the creation of a file.
      It is not possible to apply different advice for different access (random get vs compaction read),
      without keeping two open files for the same table. This patch extended the RandomeAccessFile interface
      to accept new access hint at anytime. Particularly, we are able to set different access hint on the same
      table file based on when/how the file is used.
      Two options are added to set the access hint, after the file is first opened and after the file is being
      compacted.
      
      Test Plan: make check; db_stress; db_bench
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: MarkCallaghan, leveldb
      
      Differential Revision: https://reviews.facebook.net/D10905
      ab8d2f6a
  19. 24 5月, 2013 1 次提交
  20. 22 5月, 2013 1 次提交
    • D
      Ability to set different size fanout multipliers for every level. · d1aaaf71
      Dhruba Borthakur 提交于
      Summary:
      There is an existing field Options.max_bytes_for_level_multiplier that
      sets the multiplier for the size of each level in the database.
      
      This patch introduces the ability to set different multipliers
      for every level in the database. The size of a level is determined
      by using both max_bytes_for_level_multiplier as well as the
      per-level fanout.
      
      size of level[i] = size of level[i-1] * max_bytes_for_level_multiplier
                         * fanout[i-1]
      
      The default value of fanout is 1, so that it is backward compatible.
      
      Test Plan: make check
      
      Reviewers: haobo, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10863
      d1aaaf71
  21. 26 4月, 2013 1 次提交
  22. 23 4月, 2013 1 次提交
  23. 21 4月, 2013 2 次提交
    • M
      Add --writes_per_second rate limit, print p99.99 in histogram · b1ff9ac9
      Mark Callaghan 提交于
      Summary:
      Adds the --writes_per_second rate limit for the readwhilewriting test.
      The purpose is to optionally avoid saturating storage with writes & compaction
      and test read response time when some writes are being done.
      
      Changes the histogram code to also print the p99.99 value
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, ran db_bench with it
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10305
      b1ff9ac9
    • H
      [RocksDB] Add stacktrace signal handler · 1255dcd4
      Haobo Xu 提交于
      Summary:
      This diff provides the ability to print out a stacktrace when the process receives certain signals.
      Currently, we enable this for the following signals (program error related):
      SIGILL SIGSEGV SIGBUS SIGABRT
      Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode).
      
      Sample output:
      Received signal 11 (Segmentation fault)
      #0  0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4
      #1  0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24
      #2  0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0
      #3  0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113
      Segmentation fault (core dumped)
      
      For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now.
      
      Test Plan: signal_test.cc
      
      Reviewers: dhruba, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10173
      1255dcd4
  24. 28 3月, 2013 1 次提交
    • A
      memory manage statistics · 63f216ee
      Abhishek Kona 提交于
      Summary:
      Earlier Statistics object was a raw pointer. This meant the user had to clear up
      the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted.
      Now Using a shared_ptr to manage this.
      
      Want this in before the next release.
      
      Test Plan: make all check.
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9735
      63f216ee
  25. 21 3月, 2013 1 次提交
    • D
      Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. · ad96563b
      Dhruba Borthakur 提交于
      Summary:
      This patch allows an application to specify whether to use bufferedio,
      reads-via-mmaps and writes-via-mmaps per database. Earlier, there
      was a global static variable that was used to configure this functionality.
      
      The default setting remains the same (and is backward compatible):
       1. use bufferedio
       2. do not use mmaps for reads
       3. use mmap for writes
       4. use readaheads for reads needed for compaction
      
      I also added a parameter to db_bench to be able to explicitly specify
      whether to do readaheads for compactions or not.
      
      Test Plan: make check
      
      Reviewers: sheki, heyongqiang, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9429
      ad96563b
  26. 20 3月, 2013 2 次提交
  27. 15 3月, 2013 1 次提交
    • M
      Enhance db_bench · 5a8c8845
      Mark Callaghan 提交于
      Summary:
      Add --benchmarks=updaterandom for read-modify-write workloads. This is different
      from --benchmarks=readrandomwriterandom in a few ways. First, an "operation" is the
      combined time to do the read & write rather than treating them as two ops. Second,
      the same key is used for the read & write.
      
      Change RandomGenerator to support rows larger than 1M. That was using "assert"
      to fail and assert is compiled-away when -DNDEBUG is used.
      
      Add more options to db_bench
      --duration - sets the number of seconds for tests to run. When not set the
      operation count continues to be the limit. This is used by random operation
      tests.
      
      --use_snapshot - when set GetSnapshot() is called prior to each random read.
      This is to measure the overhead from using snapshots.
      
      --get_approx - when set GetApproximateSizes() is called prior to each random
      read. This is to measure the overhead for a query optimizer.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D9267
      5a8c8845