1. 04 2月, 2014 1 次提交
  2. 25 1月, 2014 3 次提交
    • S
      Moving Some includes from options.h to forward declaration · 8477255d
      Siying Dong 提交于
      Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.
      
      Test Plan: make all check
      
      Reviewers: kailiu, haobo, igor, dhruba
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15411
      8477255d
    • I
      Revert "Moving to glibc-fb" · e832e72b
      Igor Canadi 提交于
      This reverts commit d24961b6.
      
      For some reason, glibc2.17-fb breaks gflags. Reverting for now
      e832e72b
    • I
      Moving to glibc-fb · d24961b6
      Igor Canadi 提交于
      Summary:
      It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.
      
      This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.
      
      Test Plan:
      Compiled
      Ran ./db_bench ./db_stress ./db_repl_stress
      
      Reviewers: kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15405
      d24961b6
  3. 18 1月, 2014 1 次提交
  4. 11 1月, 2014 1 次提交
    • S
      Improve RocksDB "get" performance by computing merge result in memtable · a09ee106
      Schalk-Willem Kruger 提交于
      Summary:
      Added an option (max_successive_merges) that can be used to specify the
      maximum number of successive merge operations on a key in the memtable.
      This can be used to improve performance of the "get" operation. If many
      successive merge operations are performed on a key, the performance of "get"
      operations on the key deteriorates, as the value has to be computed for each
      "get" operation by applying all the successive merge operations.
      
      FB Task ID: #3428853
      
      Test Plan:
      make all check
      db_bench --benchmarks=readrandommergerandom
      counter_stress_test
      
      Reviewers: haobo, vamsi, dhruba, sdong
      
      Reviewed By: haobo
      
      CC: zshao
      
      Differential Revision: https://reviews.facebook.net/D14991
      a09ee106
  5. 19 12月, 2013 1 次提交
    • M
      Add 'readtocache' test · ca92068b
      Mark Callaghan 提交于
      Summary:
      For some tests I want to cache the database prior to running other tests on the same invocation
      of db_bench. The readtocache test ignores --threads and --reads so those can be used by other tests
      and it will still do a full read of --num rows with one thread. It might be invoked like:
        db_bench --benchmarks=readtocache,readrandom --reads 100 --num 10000 --threads 8
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14739
      ca92068b
  6. 13 12月, 2013 1 次提交
    • M
      Add monitoring for universal compaction and add counters for compaction IO · e9e6b00d
      Mark Callaghan 提交于
      Summary:
      Adds these counters
      { WAL_FILE_SYNCED, "rocksdb.wal.synced" }
        number of writes that request a WAL sync
      { WAL_FILE_BYTES, "rocksdb.wal.bytes" },
        number of bytes written to the WAL
      { WRITE_DONE_BY_SELF, "rocksdb.write.self" },
        number of writes processed by the calling thread
      { WRITE_DONE_BY_OTHER, "rocksdb.write.other" },
        number of writes not processed by the calling thread. Instead these were
        processed by the current holder of the write lock
      { WRITE_WITH_WAL, "rocksdb.write.wal" },
        number of writes that request WAL logging
      { COMPACT_READ_BYTES, "rocksdb.compact.read.bytes" },
        number of bytes read during compaction
      { COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes" },
        number of bytes written during compaction
      
      Per-interval stats output was updated with WAL stats and correct stats for universal compaction
      including a correct value for write-amplification. It now looks like:
                                     Compactions
      Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall Stall-cnt
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0        7      464  46.4       281      3411      3875      3411         0      3875        2.1      12.1        13.8      621        0      240      240      628       0.0         0
      Uptime(secs): 310.8 total, 2.0 interval
      Writes cumulative: 9999999 total, 9999999 batches, 1.0 per batch, 1.22 ingest GB
      WAL cumulative: 9999999 WAL writes, 9999999 WAL syncs, 1.00 writes per sync, 1.22 GB written
      Compaction IO cumulative (GB): 1.22 new, 3.33 read, 3.78 write, 7.12 read+write
      Compaction IO cumulative (MB/sec): 4.0 new, 11.0 read, 12.5 write, 23.4 read+write
      Amplification cumulative: 4.1 write, 6.8 compaction
      Writes interval: 100000 total, 100000 batches, 1.0 per batch, 12.5 ingest MB
      WAL interval: 100000 WAL writes, 100000 WAL syncs, 1.00 writes per sync, 0.01 MB written
      Compaction IO interval (MB): 12.49 new, 14.98 read, 21.50 write, 36.48 read+write
      Compaction IO interval (MB/sec): 6.4 new, 7.6 read, 11.0 write, 18.6 read+write
      Amplification interval: 101.7 write, 102.9 compaction
      Stalls(secs): 142.924 level0_slowdown, 0.000 level0_numfiles, 0.805 memtable_compaction, 0.000 leveln_slowdown
      Stalls(count): 132461 level0_slowdown, 0 level0_numfiles, 3 memtable_compaction, 0 leveln_slowdown
      
      Task ID: #3329644, #3301695
      
      Blame Rev:
      
      Test Plan:
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14583
      e9e6b00d
  7. 04 12月, 2013 2 次提交
    • M
      Add compression options to db_bench · 97aa401e
      Mark Callaghan 提交于
      Summary:
      This adds 2 options for compression to db_bench:
      * universal_compression_size_percent
      * compression_level - to set zlib compression level
      It also logs compression_size_percent at startup in LOG
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14439
      97aa401e
    • I
      Killing Transform Rep · eb12e47e
      Igor Canadi 提交于
      Summary:
      Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around.
      
      This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release.
      
      I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14397
      eb12e47e
  8. 22 11月, 2013 2 次提交
  9. 17 11月, 2013 1 次提交
  10. 13 11月, 2013 1 次提交
  11. 07 11月, 2013 1 次提交
    • S
      WAL log retention policy based on archive size. · c2be2cba
      shamdor 提交于
      Summary:
      Archive cleaning will still happen every WAL_ttl seconds
      but archived logs will be deleted only if archive size
      is greater then a WAL_size_limit value.
      Empty archived logs will be deleted evety WAL_ttl.
      
      Test Plan:
      1. Unit tests pass.
      2. Benchmark.
      
      Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13869
      c2be2cba
  12. 02 11月, 2013 1 次提交
    • D
      Implement a compressed block cache. · b4ad5e89
      Dhruba Borthakur 提交于
      Summary:
      Rocksdb can now support a uncompressed block cache, or a compressed
      block cache or both. Lookups first look for a block in the
      uncompressed cache, if it is not found only then it is looked up
      in the compressed cache. If it is found in the compressed cache,
      then it is uncompressed and inserted into the uncompressed cache.
      
      It is possible that the same block resides in the compressed cache
      as well as the uncompressed cache at the same time. Both caches
      have their own individual LRU policy.
      
      Test Plan: Unit test case attached.
      
      Reviewers: kailiu, sdong, haobo, leveldb
      
      Reviewed By: haobo
      
      CC: xjin, haobo
      
      Differential Revision: https://reviews.facebook.net/D12675
      b4ad5e89
  13. 24 10月, 2013 1 次提交
  14. 17 10月, 2013 1 次提交
  15. 12 10月, 2013 1 次提交
    • S
      LRUCache to try to clean entries not referenced first. · f8509653
      sdong 提交于
      Summary:
      With this patch, when LRUCache.Insert() is called and the cache is full, it will first try to free up entries whose reference counter is 1 (would become 0 after remo\
      ving from the cache). We do it in two passes, in the first pass, we only try to release those unreferenced entries. If we cannot free enough space after traversing t\
      he first remove_scan_cnt_ entries, we start from the beginning again and remove those entries being used.
      
      Test Plan: add two unit tests to cover the codes
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb, emayanke, xjin
      
      Differential Revision: https://reviews.facebook.net/D13377
      f8509653
  16. 06 10月, 2013 1 次提交
  17. 05 10月, 2013 1 次提交
  18. 20 9月, 2013 1 次提交
    • D
      Better locking in vectorrep that increases throughput to match speed of storage. · 5e9f3a9a
      Dhruba Borthakur 提交于
      Summary:
      There is a use-case where we want to insert data into rocksdb as
      fast as possible. Vector rep is used for this purpose.
      
      The background flush thread needs to flush the vectorrep to
      storage. It acquires the dblock then sorts the vector, releases
      the dblock and then writes the sorted vector to storage. This is
      suboptimal because the lock is held during the sort, which
      prevents new writes for occuring.
      
      This patch moves the sorting of the vector rep to outside the
      db mutex. Performance is now as fastas the underlying storage
      system. If you are doing buffered writes to rocksdb files, then
      you can observe throughput upwards of 200 MB/sec writes.
      
      This is an early draft and not yet ready to be reviewed.
      
      Test Plan:
      make check
      
      Task ID: #
      
      Blame Rev:
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D12987
      5e9f3a9a
  19. 16 9月, 2013 2 次提交
  20. 14 9月, 2013 1 次提交
    • D
      Added a parameter to limit the maximum space amplification for universal compaction. · 4012ca1c
      Dhruba Borthakur 提交于
      Summary:
      Added a new field called max_size_amplification_ratio in the
      CompactionOptionsUniversal structure. This determines the maximum
      percentage overhead of space amplification.
      
      The size amplification is defined to be the ratio between the size of
      the oldest file to the sum of the sizes of all other files. If the
      size amplification exceeds the specified value, then min_merge_width
      and max_merge_width are ignored and a full compaction of all files is done.
      A value of 10 means that the size a database that stores 100 bytes
      of user data could occupy 110 bytes of physical storage.
      
      Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.
      
      Reviewers: haobo, emayanke, xjin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12825
      4012ca1c
  21. 13 9月, 2013 1 次提交
    • H
      [RocksDB] Remove Log file immediately after memtable flush · 0e422308
      Haobo Xu 提交于
      Summary: As title. The DB log file life cycle is tied up with the memtable it backs. Once the memtable is flushed to sst and committed, we should be able to delete the log file, without holding the mutex. This is part of the bigger change to avoid FindObsoleteFiles at runtime. It deals with log files. sst files will be dealt with later.
      
      Test Plan: make check; db_bench
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11709
      0e422308
  22. 24 8月, 2013 2 次提交
  23. 23 8月, 2013 4 次提交
    • J
      Add three new MemTableRep's · 74781a0c
      Jim Paton 提交于
      Summary:
      This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
      
      UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
      
      VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
      
      PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
      
      I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
      
      Test Plan:
      make -j32 check
      ./db_stress --memtablerep=vector
      ./db_stress --memtablerep=unsorted
      ./db_stress --memtablerep=prefixhash --prefix_size=10
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12117
      74781a0c
    • X
      Pull from https://reviews.facebook.net/D10917 · 17dc1280
      Xing Jin 提交于
      Summary: Pull Mark's patch and slightly revise it. I revised another place in db_impl.cc with similar new formula.
      
      Test Plan:
      make all check. Also run "time ./db_bench --num=2500000000 --numdistinct=2200000000". It has run for 20+ hours and hasn't finished. Looks good so far:
      
      Installed stack trace handler for SIGILL SIGSEGV SIGBUS SIGABRT
      LevelDB:    version 2.0
      Date:       Tue Aug 20 23:11:55 2013
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    2500000000
      RawSize:    276565.6 MB (estimated)
      FileSize:   157356.3 MB (estimated)
      Write rate limit: 0
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillseq      :    7202.000 micros/op 138 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillsync     :    7148.000 micros/op 139 ops/sec; (2500000 ops)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillrandom   :    7105.000 micros/op 140 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      overwrite    :    6930.000 micros/op 144 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980507 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.021 micros/op 979620 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     113.000 micros/op 8849 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :     102.000 micros/op 9803 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      Created bg thread 0x7f0ac17f7700
      compact      :  111701.000 micros/op 8 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980376 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     120.000 micros/op 8333 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :      29.000 micros/op 34482 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      ... finished 618100000 ops
      
      Reviewers: MarkCallaghan, haobo, dhruba, chip
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D12441
      17dc1280
    • T
      Revert "Prefix scan: db_bench and bug fixes" · 94cf2187
      Tyler Harter 提交于
      This reverts commit c2bd8f48.
      94cf2187
    • T
      Prefix scan: db_bench and bug fixes · c2bd8f48
      Tyler Harter 提交于
      Summary: If use_prefix_filters is set and read_range>1, then the random seeks will set a the prefix filter to be the prefix of the key which was randomly selected as the target.  Still need to add statistics (perhaps in a separate diff).
      
      Test Plan: ./db_bench --benchmarks=fillseq,prefixscanrandom --num=10000000 --statistics=1 --use_prefix_blooms=1 --use_prefix_api=1 --bloom_bits=10
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D12273
      c2bd8f48
  24. 21 8月, 2013 1 次提交
  25. 16 8月, 2013 2 次提交
    • D
      Tiny fix to db_bench for make release. · d1d3d15e
      Deon Nicholas 提交于
      Summary:
      In release, "found variable assigned but not used anywhere". Changed it to work with
      assert. Someone accept this :).
      
      Test Plan: make release -j 32
      
      Reviewers: haobo, dhruba, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12309
      d1d3d15e
    • D
      Benchmarking for Merge Operator · ad48c3c2
      Deon Nicholas 提交于
      Summary:
      Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
      of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
      a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
      so that the tester can easily benchmark different merge operators. Currently supports
      the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.
      
      Test Plan:
      	1. make db_bench
      	2. Test the PutOperator (simulating Put) as follows:
      ./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
      --threads=2
      
      3. Test the UInt64AddOperator (simulating numeric addition) similarly:
      ./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
      --merge_operator=uint64add --threads=2
      
      Reviewers: haobo, dhruba, zshao, MarkCallaghan
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11535
      ad48c3c2
  26. 15 8月, 2013 1 次提交
    • X
      Minor fix to current codes · 0a5afd1a
      Xing Jin 提交于
      Summary:
      Minor fix to current codes, including: coding style, output format,
      comments. No major logic change. There are only 2 real changes, please see my inline comments.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12297
      0a5afd1a
  27. 06 8月, 2013 1 次提交
    • J
      Add soft and hard rate limit support · 1036537c
      Jim Paton 提交于
      Summary:
      This diff adds support for both soft and hard rate limiting. The following changes are included:
      
      1) Options.rate_limit is renamed to Options.hard_rate_limit.
      2) Options.rate_limit_delay_milliseconds is renamed to Options.rate_limit_delay_max_milliseconds.
      3) Options.soft_rate_limit is added.
      4) If the maximum compaction score is > hard_rate_limit and rate_limit_delay_max_milliseconds == 0, then writes are delayed by 1 ms at a time until the max compaction score falls below hard_rate_limit.
      5) If the max compaction score is > soft_rate_limit but <= hard_rate_limit, then writes are delayed by 0-1 ms depending on how close we are to hard_rate_limit.
      6) Users can disable 4 by setting hard_rate_limit = 0. They can add a limit to the maximum amount of time waited by setting rate_limit_delay_max_milliseconds > 0. Thus, the old behavior can be preserved by setting soft_rate_limit = 0, which is the default.
      
      Test Plan:
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, haobo, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12003
      1036537c
  28. 24 7月, 2013 1 次提交
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  29. 12 7月, 2013 1 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919
  30. 10 7月, 2013 1 次提交