1. 15 10月, 2013 2 次提交
    • K
      Add statistics to sst file · 86ef6c3f
      Kai Liu 提交于
      Summary:
      So far we only have key/value pairs as well as bloom filter stored in the
      sst file.  It will be great if we are able to store more metadata about
      this table itself, for example, the entry size, bloom filter name, etc.
      
      This diff is the first step of this effort. It allows table to keep the
      basic statistics mentioned in http://fburl.com/14995441, as well as
      allowing writing user-collected stats to stats block.
      
      After this diff, we will figure out the interface of how to allow user to collect their interested statistics.
      
      Test Plan:
      1. Added several unit tests.
      2. Ran `make check` to ensure it doesn't break other tests.
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13419
      86ef6c3f
    • S
      Change Function names from Compaction->Flush When they really mean Flush · 88f2f890
      Siying Dong 提交于
      Summary: When I debug the unit test failures when enabling background flush thread, I feel the function names can be made clearer for people to understand. Also, if the names are fixed, in many places, some tests' bugs are obvious (and some of those tests are failing). This patch is to clean it up for future maintenance.
      
      Test Plan: Run test suites.
      
      Reviewers: haobo, dhruba, xjin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13431
      88f2f890
  2. 11 10月, 2013 1 次提交
  3. 09 10月, 2013 1 次提交
    • N
      Add option for storing transaction logs in a separate dir · cbf4a064
      Naman Gupta 提交于
      Summary: In some cases, you might not want to store the data log (write ahead log) files in the same dir as the sst files. An example use case is leaf, which stores sst files in tmpfs. And would like to save the log files in a separate dir (disk) to save memory.
      
      Test Plan: make all. Ran db_test test. A few test failing. P2785018. If you guys don't see an obvious problem with the code, maybe somebody from the rocksdb team could help me debug the issue here. Running this on leaf worked well. I could see logs stored on disk, and deleted appropriately after compactions. Obviously this is only one set of options. The unit tests cover different options. Seems like I'm missing some edge cases.
      
      Reviewers: dhruba, haobo, leveldb
      
      CC: xinyaohu, sumeet
      
      Differential Revision: https://reviews.facebook.net/D13239
      cbf4a064
  4. 06 10月, 2013 2 次提交
  5. 05 10月, 2013 3 次提交
  6. 04 10月, 2013 2 次提交
  7. 03 10月, 2013 1 次提交
    • X
      Fix SIGSEGV issue in universal compaction · 658a3ce2
      Xing Jin 提交于
      Summary:
      We saw SIGSEGV when set options.num_levels=1 in universal compaction
      style. Dug into this issue for a while, and finally found the root cause (thank Haobo for discussion).
      
      Test Plan: Add new unit test. It throws SIGSEGV without this change. Also run "make all check".
      
      Reviewers: haobo, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13251
      658a3ce2
  8. 29 9月, 2013 1 次提交
  9. 27 9月, 2013 1 次提交
  10. 13 9月, 2013 1 次提交
    • H
      [RocksDB] Remove Log file immediately after memtable flush · 0e422308
      Haobo Xu 提交于
      Summary: As title. The DB log file life cycle is tied up with the memtable it backs. Once the memtable is flushed to sst and committed, we should be able to delete the log file, without holding the mutex. This is part of the bigger change to avoid FindObsoleteFiles at runtime. It deals with log files. sst files will be dealt with later.
      
      Test Plan: make check; db_bench
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11709
      0e422308
  11. 07 9月, 2013 1 次提交
    • D
      Flush was hanging because the configured options specified that more than 1... · 32c965d4
      Dhruba Borthakur 提交于
      Flush was hanging because the configured options specified that more than 1 memtable need to be merged.
      
      Summary:
      There is an config option called Options.min_write_buffer_number_to_merge
      that specifies the minimum number of write buffers to merge in memory
      before flushing to a file in L0. But in the the case when the db is
      being closed, we should not be using this config, instead we should
      flush whatever write buffers were available at that time.
      
      Test Plan: Unit test attached.
      
      Reviewers: haobo, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12717
      32c965d4
  12. 05 9月, 2013 2 次提交
    • M
      Return pathname relative to db dir in LogFile and cleanup AppendSortedWalsOfType · aa5c897d
      Mayank Agarwal 提交于
      Summary: So that replication can just download from wherever LogFile.Pathname is pointing them.
      
      Test Plan: make all check;./db_repl_stress
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12609
      aa5c897d
    • X
      New ldb command to convert compaction style · 42c109cc
      Xing Jin 提交于
      Summary:
      Add new command "change_compaction_style" to ldb tool. For
      universal->level, it shows "nothing to do". For level->universal, it
      compacts all files into a single one and moves the file to level 0.
      
      Also add check for number of files at level 1+ when opening db with
      universal compaction style.
      
      Test Plan:
      'make all check'. New unit test for internal convertion function. Also manully test various
      cmd like:
      
      ./ldb change_compaction_style --old_compaction_style=0
      --new_compaction_style=1 --db=/tmp/leveldbtest-3088/db_test
      
      Reviewers: haobo, dhruba
      
      Reviewed By: haobo
      
      CC: vamsi, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12603
      42c109cc
  13. 02 9月, 2013 2 次提交
    • M
      Fix bug in Counters and record Sequencenumber using only TickerCount · c34271a5
      Mayank Agarwal 提交于
      Summary:
      The way counters/statistics are implemented in rocksdb demands that enum Tickers and TickerNameMap follow the same order, otherwise statistics exposed from fbcode/rocks get out-of-sync. 2 counters for prefix had violated this order and when I built counters for fbcode/mcrocksdb, statistics for sequence number were appearing out-of-sync.
      The other change is to record sequence-number using setTickerCount only and not recordTick. This is because of difference in statistics as understood by rocks/utils which uses ServiceData::statistics function and rocksdb statistics. In rocksdb there is just 1 counter for a countername. But in ServiceData there are 4 independent buckets for every countername-Count, Sum, Average and Rate. SetTickerCount and RecordTick update the same variable in rocksdb but different buckets in ServiceData. Therefore, I had to choose one consistent function from RecordTick or SetTickerCount for sequence number in rocksdb. I chose SetTickerCount because the statistics object in options passed during rocksdb-open is user-dependent and SetTickerCount makes sense there.
      There will be a corresponding diff to mcorcksdb in fbcode shortly.
      
      Test Plan: make all check; check ticker value using fprintfs
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12669
      c34271a5
    • M
      Fix build caused by DeleteFile not tolerating / at the beginning · ab5c5c28
      Mayank Agarwal 提交于
      Summary: db->DeleteFile calls ParseFileName to check name that was returned for sst file. Now, sst filename is returned using TableFileName which uses MakeFileName. This puts a / at the front of the name and ParseFileName doesn't like that. Changed ParseFileName to tolerate /s at the beginning. The test delet_file_test used to pass earlier because this behaviour of MakeFileName had been changed a while back to not return a / during which delete_file_test was checked in. But MakeFileName had to be reverted to add / at the front because GetLiveFiles used at many places outside rocksdb used the previous behaviour of MakeFileName.
      
      Test Plan: make;./delete_filetest;make all check
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12663
      ab5c5c28
  14. 29 8月, 2013 3 次提交
    • D
      Cleanup DeleteFile API · 59de2dba
      Dhruba Borthakur 提交于
      Summary:
      The DeleteFile API was removing files inside the db-lock. This
      is now changed to remove files outside the db-lock.
      The GetLiveFilesMetadata() returns the smallest and largest
      seqnuence number of each file as well.
      
      Test Plan: deletefile_test
      
      Reviewers: emayanke, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Maniphest Tasks: T63
      
      Differential Revision: https://reviews.facebook.net/D12567
      59de2dba
    • H
      [RocksDB] Fix TransformRepFactory related valgrind problem · 48e5ea0c
      Haobo Xu 提交于
      Summary: Let TransformRepFactory own the passed in transform. Also make it better encapsulated.
      
      Test Plan: make valgrind_check;
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12591
      48e5ea0c
    • D
      Introduced a new flag non_blocking_io in ReadOptions. · fc0c399d
      Dhruba Borthakur 提交于
      Summary:
      If ReadOptions.non_blocking_io is set to true, then KeyMayExists
      and Iterators will return data that is cached in RAM.
      If the Iterator needs to do IO from storage to serve the data,
      then the Iterator.status() will return Status::IsRetry().
      
      Test Plan:
      Enhanced unit test DBTest.KeyMayExist to detect if there were are IOs
      issues from storage. Added DBTest.NonBlockingIteration to verify
      nonblocking Iterations.
      
      Reviewers: emayanke, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Maniphest Tasks: T63
      
      Differential Revision: https://reviews.facebook.net/D12531
      fc0c399d
  15. 28 8月, 2013 1 次提交
  16. 24 8月, 2013 2 次提交
  17. 23 8月, 2013 3 次提交
    • J
      Add three new MemTableRep's · 74781a0c
      Jim Paton 提交于
      Summary:
      This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
      
      UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
      
      VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
      
      PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
      
      I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
      
      Test Plan:
      make -j32 check
      ./db_stress --memtablerep=vector
      ./db_stress --memtablerep=unsorted
      ./db_stress --memtablerep=prefixhash --prefix_size=10
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12117
      74781a0c
    • X
      Pull from https://reviews.facebook.net/D10917 · 17dc1280
      Xing Jin 提交于
      Summary: Pull Mark's patch and slightly revise it. I revised another place in db_impl.cc with similar new formula.
      
      Test Plan:
      make all check. Also run "time ./db_bench --num=2500000000 --numdistinct=2200000000". It has run for 20+ hours and hasn't finished. Looks good so far:
      
      Installed stack trace handler for SIGILL SIGSEGV SIGBUS SIGABRT
      LevelDB:    version 2.0
      Date:       Tue Aug 20 23:11:55 2013
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    2500000000
      RawSize:    276565.6 MB (estimated)
      FileSize:   157356.3 MB (estimated)
      Write rate limit: 0
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillseq      :    7202.000 micros/op 138 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillsync     :    7148.000 micros/op 139 ops/sec; (2500000 ops)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      fillrandom   :    7105.000 micros/op 140 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      overwrite    :    6930.000 micros/op 144 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980507 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.021 micros/op 979620 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     113.000 micros/op 8849 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :     102.000 micros/op 9803 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      Created bg thread 0x7f0ac17f7700
      compact      :  111701.000 micros/op 8 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readrandom   :       1.020 micros/op 980376 ops/sec; (0 of 2500000000 found)
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readseq      :     120.000 micros/op 8333 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      readreverse  :      29.000 micros/op 34482 ops/sec;
      DB path: [/tmp/leveldbtest-3088/dbbench]
      ... finished 618100000 ops
      
      Reviewers: MarkCallaghan, haobo, dhruba, chip
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D12441
      17dc1280
    • S
      Add APIs to query SST file metadata and to delete specific SST files · 60bf2b7d
      Simha Venkataramaiah 提交于
      Summary: An api to query the level, key ranges, size etc for each SST file and an api to delete a specific file from the db and all associated state in the bookkeeping datastructures.
      
      Notes: Editing the manifest version does not release the obsolete files right away. However deleting the file directly will mess up the iterator. We may need a more aggressive/timely file deletion api.
      
      I have used std::unique_ptr - will switch to boost:: since this is external. thoughts?
      
      Unit test is fragile right now as it expects the compaction at certain levels.
      
      Test Plan: unittest
      
      Reviewers: dhruba, vamsi, emayanke
      
      CC: zshao, leveldb, haobo
      
      Task ID: #
      
      Blame Rev:
      60bf2b7d
  18. 21 8月, 2013 1 次提交
  19. 20 8月, 2013 1 次提交
  20. 16 8月, 2013 1 次提交
    • M
      Expose statistic for sequence number and implement setTickerCount · 387ac0f1
      Mayank Agarwal 提交于
      Summary: statistic for sequence number is needed by wormhole. setTickerCount is demanded for this statistic. I can't simply recordTick(max_sequence) when db recovers because the statistic iobject is owned by client and may/may not be reset during reopen. Eg. statistic is reset in mcrocksdb whereas it is not in db_stress. Therefore it is best to go with setTickerCount
      
      Test Plan: ./db_stress ... --statistics=1 and observed expected sequence number
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12327
      387ac0f1
  21. 15 8月, 2013 1 次提交
    • X
      Minor fix to current codes · 0a5afd1a
      Xing Jin 提交于
      Summary:
      Minor fix to current codes, including: coding style, output format,
      comments. No major logic change. There are only 2 real changes, please see my inline comments.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12297
      0a5afd1a
  22. 14 8月, 2013 3 次提交
    • M
      Counter for merge failure · f1bf1694
      Mayank Agarwal 提交于
      Summary:
      With Merge returning bool, it can keep failing silently(eg. While faling to fetch timestamp in TTL). We need to detect this through a rocksdb counter which can get bumped whenever Merge returns false. This will also be super-useful for the mcrocksdb-counter service where Merge may fail.
      Added a counter NUMBER_MERGE_FAILURES and appropriately updated db/merge_helper.cc
      
      I felt that it would be better to directly add counter-bumping in Merge as a default function of MergeOperator class but user should not be aware of this, so this approach seems better to me.
      
      Test Plan: make all check
      
      Reviewers: dnicholas, haobo, dhruba, vamsi
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12129
      f1bf1694
    • T
      Prefix filters for scans (v4) · f5f18422
      Tyler Harter 提交于
      Summary: Similar to v2 (db and table code understands prefixes), but use ReadOptions as in v3.  Also, make the CreateFilter code faster and cleaner.
      
      Test Plan: make db_test; export LEVELDB_TESTS=PrefixScan; ./db_test
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: haobo, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12027
      f5f18422
    • S
      Separate compaction filter for each compaction · 3b81df34
      sumeet 提交于
      Summary:
      If we have same compaction filter for each compaction,
      application cannot know about the different compaction processes.
      Later on, we can put in more details in compaction filter for the
      application to consume and use it according to its needs. For e.g. In
      the universal compaction, we have a compaction process involving all the
      files while others don't involve all the files. Applications may want to
      collect some stats only when during full compaction.
      
      Test Plan: run existing unit tests
      
      Reviewers: haobo, dhruba
      
      Reviewed By: dhruba
      
      CC: xinyaohu, leveldb
      
      Differential Revision: https://reviews.facebook.net/D12057
      3b81df34
  23. 10 8月, 2013 2 次提交
    • D
      Universal Compaction should keep DeleteMarkers unless it is the earliest file. · 93d77a27
      Dhruba Borthakur 提交于
      Summary:
      The pre-existing code was purging a DeleteMarker if thay key did not
      exist in deeper levels.  But in the Universal Compaction Style, all
      files are in Level0. For compaction runs that did not include the
      earliest file, we were erroneously purging the DeleteMarkers.
      
      The fix is to purge DeleteMarkers only if the compaction includes
      the earlist file.
      
      Test Plan: DBTest.Randomized triggers this code path.
      
      Differential Revision: https://reviews.facebook.net/D12081
      93d77a27
    • X
      Fix unit tests for universal compaction (step 2) · 8ae905ed
      Xing Jin 提交于
      Summary:
      Continue fixing existing unit tests for universal compaction. I have
      tried to apply universal compaction to all unit tests those haven't
      called ChangeOptions(). I left a few which are either apparently not
      applicable to universal compaction (because they check files/keys/values
      at level 1 or above levels), or apparently not related to compaction
      (e.g., open a file, open a db).
      
      I also add a new unit test for universal compaction.
      
      Good news is I didn't see any bugs during this round.
      
      Test Plan: Ran "make all check" yesterday. Has rebased and is rerunning
      
      Reviewers: haobo, dhruba
      
      Differential Revision: https://reviews.facebook.net/D12135
      8ae905ed
  24. 08 8月, 2013 1 次提交
    • X
      Fix unit tests/bugs for universal compaction (first step) · 17b8f786
      Xing Jin 提交于
      Summary:
      This is the first step to fix unit tests and bugs for universal
      compactiion. I added universal compaction option to ChangeOptions(), and
      fixed all unit tests calling ChangeOptions(). Some of these tests
      obviously assume more than 1 level and check file number/values in level
      1 or above levels. I set kSkipUniversalCompaction for these tests.
      
      The major bug I found is manual compaction with universal compaction never stops. I have put a fix for
      it.
      
      I have also set universal compaction as the default compaction and found
      at least 20+ unit tests failing. I haven't looked into the details. The
      next step is to check all unit tests without calling ChangeOptions().
      
      Test Plan: make all check
      
      Reviewers: dhruba, haobo
      
      Differential Revision: https://reviews.facebook.net/D12051
      17b8f786
  25. 06 8月, 2013 1 次提交
    • D
      [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. · c2d7826c
      Deon Nicholas 提交于
      Summary:
      Here are the major changes to the Merge Interface. It has been expanded
      to handle cases where the MergeOperator is not associative. It does so by stacking
      up merge operations while scanning through the key history (i.e.: during Get() or
      Compaction), until a valid Put/Delete/end-of-history is encountered; it then
      applies all of the merge operations in the correct sequence starting with the
      base/sentinel value.
      
      I have also introduced an "AssociativeMerge" function which allows the user to
      take advantage of associative merge operations (such as in the case of counters).
      The implementation will always attempt to merge the operations/operands themselves
      together when they are encountered, and will resort to the "stacking" method if
      and only if the "associative-merge" fails.
      
      This implementation is conjectured to allow MergeOperator to handle the general
      case, while still providing the user with the ability to take advantage of certain
      efficiencies in their own merge-operator / data-structure.
      
      NOTE: This is a preliminary diff. This must still go through a lot of review,
      revision, and testing. Feedback welcome!
      
      Test Plan:
        -This is a preliminary diff. I have only just begun testing/debugging it.
        -I will be testing this with the existing MergeOperator use-cases and unit-tests
      (counters, string-append, and redis-lists)
        -I will be "desk-checking" and walking through the code with the help gdb.
        -I will find a way of stress-testing the new interface / implementation using
      db_bench, db_test, merge_test, and/or db_stress.
        -I will ensure that my tests cover all cases: Get-Memtable,
      Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0,
      Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found,
      end-of-history, end-of-file, etc.
        -A lot of feedback from the reviewers.
      
      Reviewers: haobo, dhruba, zshao, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11499
      c2d7826c