1. 29 9月, 2013 2 次提交
  2. 27 9月, 2013 2 次提交
  3. 26 9月, 2013 2 次提交
    • H
      [RocbsDB] Add an option to enable set based memtable for perf_context_test · e0aa19a9
      Haobo Xu 提交于
      Summary:
      as title.
      Some result:
      
      -- Sequential insertion of 1M key/value with stock skip list (all in on memtable)
      time ./perf_context_test  --total_keys=1000000  --use_set_based_memetable=0
      Inserting 1000000 key/value pairs
      ...
      Put uesr key comparison:
      Count: 1000000  Average: 8.0179  StdDev: 176.34
      Min: 0.0000  Median: 2.5555  Max: 88933.0000
      Percentiles: P50: 2.56 P75: 2.83 P99: 58.21 P99.9: 133.62 P99.99: 987.50
      Get uesr key comparison:
      Count: 1000000  Average: 43.4465  StdDev: 379.03
      Min: 2.0000  Median: 36.0195  Max: 88939.0000
      Percentiles: P50: 36.02 P75: 43.66 P99: 112.98 P99.9: 824.84 P99.99: 7615.38
      real	0m21.345s
      user	0m14.723s
      sys	0m5.677s
      
      -- Sequential insertion of 1M key/value with set based memtable (all in on memtable)
      time ./perf_context_test  --total_keys=1000000  --use_set_based_memetable=1
      Inserting 1000000 key/value pairs
      ...
      Put uesr key comparison:
      Count: 1000000  Average: 61.5022  StdDev: 6.49
      Min: 0.0000  Median: 62.4295  Max: 71.0000
      Percentiles: P50: 62.43 P75: 66.61 P99: 71.00 P99.9: 71.00 P99.99: 71.00
      Get uesr key comparison:
      Count: 1000000  Average: 29.3810  StdDev: 3.20
      Min: 1.0000  Median: 29.1801  Max: 34.0000
      Percentiles: P50: 29.18 P75: 32.06 P99: 34.00 P99.9: 34.00 P99.99: 34.00
      real	0m28.875s
      user	0m21.699s
      sys	0m5.749s
      
      Worst case comparison for a Put is 88933 (skiplist) vs 71 (set based memetable)
      
      Of course, there's other in-efficiency in set based memtable implementation, which lead to the overall worst performance. However, P99 behavior advantage is very very obvious.
      
      Test Plan: ./perf_context_test and viewstate shadow testing
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13095
      e0aa19a9
    • D
      The vector rep implementation was segfaulting because of incorrect initialization of vector. · f1a60e5c
      Dhruba Borthakur 提交于
      Summary:
      The constructor for Vector memtable has a parameter called 'count'
      that specifies the capacity of the vector to be reserved at allocation
      time. It was incorrectly used to initialize the size of the vector.
      
      Test Plan: Enhanced db_test.
      
      Reviewers: haobo, xjin, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13083
      f1a60e5c
  4. 24 9月, 2013 1 次提交
  5. 21 9月, 2013 1 次提交
  6. 20 9月, 2013 2 次提交
    • D
      Better locking in vectorrep that increases throughput to match speed of storage. · 5e9f3a9a
      Dhruba Borthakur 提交于
      Summary:
      There is a use-case where we want to insert data into rocksdb as
      fast as possible. Vector rep is used for this purpose.
      
      The background flush thread needs to flush the vectorrep to
      storage. It acquires the dblock then sorts the vector, releases
      the dblock and then writes the sorted vector to storage. This is
      suboptimal because the lock is held during the sort, which
      prevents new writes for occuring.
      
      This patch moves the sorting of the vector rep to outside the
      db mutex. Performance is now as fastas the underlying storage
      system. If you are doing buffered writes to rocksdb files, then
      you can observe throughput upwards of 200 MB/sec writes.
      
      This is an early draft and not yet ready to be reviewed.
      
      Test Plan:
      make check
      
      Task ID: #
      
      Blame Rev:
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D12987
      5e9f3a9a
    • N
      Phase 1 of an iterator stress test · 43354182
      Natalie Hildebrandt 提交于
      Summary:
      Added MultiIterate() which does a seek and some Next/Prev
      calls.  Iterator status is checked only, no data integrity check
      
      Test Plan:
      make db_stress
      ./db_stress --iterpercent=<nonzero value> --readpercent=, etc.
      
      Reviewers: emayanke, dhruba, xjin
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12915
      43354182
  7. 19 9月, 2013 1 次提交
    • H
      [RocksDB] Unit test to show Seek key comparison number · 4734dbb7
      Haobo Xu 提交于
      Summary: Added SeekKeyComparison to show the uer key comparison incurred by Seek.
      
      Test Plan:
      make perf_context_test
      export LEVELDB_TESTS=DBTest.SeekKeyComparison
      ./perf_context_test --write_buffer_size=500000 --total_keys=10000
      ./perf_context_test --write_buffer_size=250000 --total_keys=10000
      
      Reviewers: dhruba, xjin
      
      Reviewed By: xjin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12843
      4734dbb7
  8. 18 9月, 2013 2 次提交
  9. 16 9月, 2013 3 次提交
  10. 14 9月, 2013 4 次提交
    • H
      [RocksDB] fix build env_test · 88664480
      Haobo Xu 提交于
      Summary: move the TwoPools test to the end of thread related tests. Otherwise, the SetBackgroundThreads call would increase the Low pool size and affect the result of other tests.
      
      Test Plan: make env_test; ./env_test
      
      Reviewers: dhruba, emayanke, xjin
      
      Reviewed By: xjin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12939
      88664480
    • D
      Added a parameter to limit the maximum space amplification for universal compaction. · 4012ca1c
      Dhruba Borthakur 提交于
      Summary:
      Added a new field called max_size_amplification_ratio in the
      CompactionOptionsUniversal structure. This determines the maximum
      percentage overhead of space amplification.
      
      The size amplification is defined to be the ratio between the size of
      the oldest file to the sum of the sizes of all other files. If the
      size amplification exceeds the specified value, then min_merge_width
      and max_merge_width are ignored and a full compaction of all files is done.
      A value of 10 means that the size a database that stores 100 bytes
      of user data could occupy 110 bytes of physical storage.
      
      Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.
      
      Reviewers: haobo, emayanke, xjin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12825
      4012ca1c
    • M
      Fix delete in db_ttl.cc · e2a093a6
      Mayank Agarwal 提交于
      Summary: should delete the proper variable
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12921
      e2a093a6
    • M
      Update README file for public interface · eeb90c7e
      Mayank Agarwal 提交于
      Summary: public interface is in include/*
      
      Test Plan: visual
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12927
      eeb90c7e
  11. 13 9月, 2013 3 次提交
    • M
      Update README file and check arc diff with proxy · 5e73c4d4
      Mayank Agarwal 提交于
      Summary:
      export http_proxy='http://172.31.255.99:8080'
      export https_proxy="$http_proxy" in bashrc makes arc work. Also README file needed to be updated
      
      Test Plan: visual
      
      Reviewers: dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12903
      5e73c4d4
    • H
      [RocksDB] Enhance Env to support two thread pools LOW and HIGH · 1565dab8
      Haobo Xu 提交于
      Summary:
      this is the ground work for separating memtable flush jobs to their own thread pool.
      Both SetBackgroundThreads and Schedule take a third parameter Priority to indicate which thread pool they are working on. The names LOW and HIGH are just identifiers for two different thread pools, and does not indicate real difference in 'priority'. We can set number of threads in the pools independently.
      The thread pool implementation is refactored.
      
      Test Plan: make check
      
      Reviewers: dhruba, emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12885
      1565dab8
    • H
      [RocksDB] Remove Log file immediately after memtable flush · 0e422308
      Haobo Xu 提交于
      Summary: As title. The DB log file life cycle is tied up with the memtable it backs. Once the memtable is flushed to sst and committed, we should be able to delete the log file, without holding the mutex. This is part of the bigger change to avoid FindObsoleteFiles at runtime. It deals with log files. sst files will be dealt with later.
      
      Test Plan: make check; db_bench
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11709
      0e422308
  12. 12 9月, 2013 1 次提交
  13. 08 9月, 2013 1 次提交
    • H
      [RocksDB] Added nano second stopwatch and new perf counters to track block read cost · f2f4c807
      Haobo Xu 提交于
      Summary: The pupose of this diff is to expose per user-call level precise timing of block read, so that we can answer questions like: a Get() costs me 100ms, is that somehow related to loading blocks from file system, or sth else? We will answer that with EXACTLY how many blocks have been read, how much time was spent on transfering the bytes from os, how much time was spent on checksum verification and how much time was spent on block decompression, just for that one Get. A nano second stopwatch was introduced to track time with higher precision. The cost/precision of the stopwatch is also measured in unit-test. On my dev box, retrieving one time instance costs about 30ns, on average. The deviation of timing results is good enough to track 100ns-1us level events. And the overhead could be safely ignored for 100us level events (10000 instances/s), for example, a viewstate thrift call.
      
      Test Plan: perf_context_test, also testing with viewstate shadow traffic.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D12351
      f2f4c807
  14. 07 9月, 2013 2 次提交
    • D
      Flush was hanging because the configured options specified that more than 1... · 32c965d4
      Dhruba Borthakur 提交于
      Flush was hanging because the configured options specified that more than 1 memtable need to be merged.
      
      Summary:
      There is an config option called Options.min_write_buffer_number_to_merge
      that specifies the minimum number of write buffers to merge in memory
      before flushing to a file in L0. But in the the case when the db is
      being closed, we should not be using this config, instead we should
      flush whatever write buffers were available at that time.
      
      Test Plan: Unit test attached.
      
      Reviewers: haobo, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12717
      32c965d4
    • D
      An iterator may automatically invoke reseeks. · 197034e4
      Dhruba Borthakur 提交于
      Summary:
      An iterator invokes reseek if the number of sequential skips over the
      same userkey exceeds a configured number. This makes iter->Next()
      faster (bacause of fewer key compares) if a large number of
      adjacent internal keys in a table (sst or memtable) have the
      same userkey.
      
      Test Plan: Unit test DBTest.IterReseek.
      
      Reviewers: emayanke, haobo, xjin
      
      Reviewed By: xjin
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11865
      197034e4
  15. 06 9月, 2013 2 次提交
  16. 05 9月, 2013 2 次提交
    • M
      Return pathname relative to db dir in LogFile and cleanup AppendSortedWalsOfType · aa5c897d
      Mayank Agarwal 提交于
      Summary: So that replication can just download from wherever LogFile.Pathname is pointing them.
      
      Test Plan: make all check;./db_repl_stress
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12609
      aa5c897d
    • X
      New ldb command to convert compaction style · 42c109cc
      Xing Jin 提交于
      Summary:
      Add new command "change_compaction_style" to ldb tool. For
      universal->level, it shows "nothing to do". For level->universal, it
      compacts all files into a single one and moves the file to level 0.
      
      Also add check for number of files at level 1+ when opening db with
      universal compaction style.
      
      Test Plan:
      'make all check'. New unit test for internal convertion function. Also manully test various
      cmd like:
      
      ./ldb change_compaction_style --old_compaction_style=0
      --new_compaction_style=1 --db=/tmp/leveldbtest-3088/db_test
      
      Reviewers: haobo, dhruba
      
      Reviewed By: haobo
      
      CC: vamsi, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12603
      42c109cc
  17. 03 9月, 2013 1 次提交
    • M
      Fix memory leak in table.cc · 352f0636
      Mayank Agarwal 提交于
      Summary:
      In InternalGet, BlockReader returns an Iterator which is legitimately freed at the end of the 'else' scope. BUT there is a break statement in between and must be freed there too!
      The best solution would be to move to unique_ptr and let it handle. Changed it to a unique_ptr.
      
      Test Plan: valgrind ./db_test;make all check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12681
      352f0636
  18. 02 9月, 2013 3 次提交
    • M
      Fix build failing becasue of ttl-keymayexist · b1d09f1a
      Mayank Agarwal 提交于
      Summary: PutValues calls Flush in ttl_test which clears memtables. KeyMayExist called after that will not be able to read those key-values
      
      Test Plan: make all check OPT=-g
      
      Reviewers:leveldb
      b1d09f1a
    • M
      Fix bug in Counters and record Sequencenumber using only TickerCount · c34271a5
      Mayank Agarwal 提交于
      Summary:
      The way counters/statistics are implemented in rocksdb demands that enum Tickers and TickerNameMap follow the same order, otherwise statistics exposed from fbcode/rocks get out-of-sync. 2 counters for prefix had violated this order and when I built counters for fbcode/mcrocksdb, statistics for sequence number were appearing out-of-sync.
      The other change is to record sequence-number using setTickerCount only and not recordTick. This is because of difference in statistics as understood by rocks/utils which uses ServiceData::statistics function and rocksdb statistics. In rocksdb there is just 1 counter for a countername. But in ServiceData there are 4 independent buckets for every countername-Count, Sum, Average and Rate. SetTickerCount and RecordTick update the same variable in rocksdb but different buckets in ServiceData. Therefore, I had to choose one consistent function from RecordTick or SetTickerCount for sequence number in rocksdb. I chose SetTickerCount because the statistics object in options passed during rocksdb-open is user-dependent and SetTickerCount makes sense there.
      There will be a corresponding diff to mcorcksdb in fbcode shortly.
      
      Test Plan: make all check; check ticker value using fprintfs
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12669
      c34271a5
    • M
      Fix build caused by DeleteFile not tolerating / at the beginning · ab5c5c28
      Mayank Agarwal 提交于
      Summary: db->DeleteFile calls ParseFileName to check name that was returned for sst file. Now, sst filename is returned using TableFileName which uses MakeFileName. This puts a / at the front of the name and ParseFileName doesn't like that. Changed ParseFileName to tolerate /s at the beginning. The test delet_file_test used to pass earlier because this behaviour of MakeFileName had been changed a while back to not return a / during which delete_file_test was checked in. But MakeFileName had to be reverted to add / at the front because GetLiveFiles used at many places outside rocksdb used the previous behaviour of MakeFileName.
      
      Test Plan: make;./delete_filetest;make all check
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12663
      ab5c5c28
  19. 01 9月, 2013 1 次提交
  20. 31 8月, 2013 2 次提交
  21. 29 8月, 2013 2 次提交