1. 01 5月, 2014 3 次提交
  2. 30 4月, 2014 5 次提交
    • I
      More s/us fixes · d6d67c0e
      Igor Canadi 提交于
      d6d67c0e
    • Y
      Add a new mem-table representation based on cuckoo hash. · 9d9d2965
      Yueh-Hsuan Chiang 提交于
      Summary:
      = Major Changes =
      * Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash.
        Cuckoo hash uses multiple hash functions.  This allows each key to have multiple
        possible locations in the mem-table.
      
        - Put: When insert a key, it will try to find whether one of its possible
          locations is vacant and store the key.  If none of its possible
          locations are available, then it will kick out a victim key and
          store at that location.  The kicked-out victim key will then be
          stored at a vacant space of its possible locations or kick-out
          another victim.  In this diff, the kick-out path (known as
          cuckoo-path) is found using BFS, which guarantees to be the shortest.
      
       - Get: Simply tries all possible locations of a key --- this guarantees
         worst-case constant time complexity.
      
       - Time complexity: O(1) for Get, and average O(1) for Put if the
         fullness of the mem-table is below 80%.
      
       - Default using two hash functions, the number of hash functions used
         by the cuckoo-hash may dynamically increase if it fails to find a
         short-enough kick-out path.
      
       - Currently, HashCuckooRep does not support iteration and snapshots,
         as our current main purpose of this is to optimize point access.
      
      = Minor Changes =
      * Add IsSnapshotSupported() to DB to indicate whether the current DB
        supports snapshots.  If it returns false, then DB::GetSnapshot() will
        always return nullptr.
      
      Test Plan:
      Run existing tests.  Will develop a test specifically for cuckoo hash in
      the next diff.
      
      Reviewers: sdong, haobo
      
      Reviewed By: sdong
      
      CC: leveldb, dhruba, igor
      
      Differential Revision: https://reviews.facebook.net/D16155
      9d9d2965
    • I
      More unsigned/signed compare fixes · f1c9aa6e
      Igor Canadi 提交于
      f1c9aa6e
    • I
      Fix more signed/unsigned comparsions · 38693d99
      Igor Canadi 提交于
      38693d99
    • I
      Cache result of ReadFirstRecord() · dd9eb7a7
      Igor Canadi 提交于
      Summary:
      ReadFirstRecord() reads the actual log file from disk on every call. This diff introduces a cache layer on top of ReadFirstRecord(), which should significantly speed up repeated calls to GetUpdatesSince().
      
      I also cleaned up some stuff, but the whole TransactionLogIterator could use some refactoring, especially if we see increased usage.
      
      Test Plan: make check
      
      Reviewers: haobo, sdong, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18387
      dd9eb7a7
  3. 29 4月, 2014 2 次提交
  4. 28 4月, 2014 1 次提交
  5. 27 4月, 2014 1 次提交
  6. 26 4月, 2014 4 次提交
  7. 25 4月, 2014 3 次提交
    • I
      Column family logging · ad3cd39c
      Igor Canadi 提交于
      Summary:
      Now that we have column families involved, we need to add extra context to every log message. They now start with "[column family name] log message"
      
      Also added some logging that I think would be useful, like level summary after every flush (I often needed that when going through the logs).
      
      Test Plan: make check + ran db_bench to confirm I'm happy with log output
      
      Reviewers: dhruba, haobo, ljin, yhchiang, sdong
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18303
      ad3cd39c
    • I
      Fix corruption test · 4cd9f58c
      Igor Canadi 提交于
      4cd9f58c
    • I
      Make CompactionInputErrorParanoid less flakey · 478990c8
      Igor Canadi 提交于
      Summary:
      I'm getting lots of e-mails with CompactionInputErrorParanoid failing. Most recent example early morning today was: http://ci-builds.fb.com/job/rocksdb_valgrind/562/consoleFull
      
      I'm putting a stop to these e-mails. I investigated why the test is flakey and it turns out it's because of non-determinsim of compaction scheduling. If there is a compaction after the last flush, CorruptFile will corrupt the compacted file instead of file at level 0 (as it assumes). That makes `Check(9, 9)` fail big time.
      
      I also saw some errors with table file getting outputed to >= 1 levels instead of 0. Also fixed that.
      
      Test Plan: Ran corruption_test 100 times without a failure. Previously it usually failed at 10th occurrence.
      
      Reviewers: dhruba, haobo, ljin
      
      Reviewed By: ljin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18285
      478990c8
  8. 24 4月, 2014 1 次提交
    • S
      Fix a bug in IterKey · 4de5b84e
      sdong 提交于
      Summary: IterKey set buffer_size_ to a wrong initial value, causing it to always allocate values from heap instead of stack if the key size is smaller. Fix it.
      
      Test Plan: make all check
      
      Reviewers: haobo, ljin
      
      Reviewed By: haobo
      
      CC: igor, dhruba, yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18279
      4de5b84e
  9. 23 4月, 2014 7 次提交
    • I
      Print out stack trace in mac, too · f9f8965e
      Igor Canadi 提交于
      Summary: While debugging Mac-only issue with ThreadLocalPtr, this was very useful. Let's print out stack trace in MAC OS, too.
      
      Test Plan: Verified that somewhat useful stack trace was generated on mac. Will run PrintStack() on linux, too.
      
      Reviewers: ljin, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18189
      f9f8965e
    • S
      Expose number of entries in mem tables to users · a5707407
      sdong 提交于
      Summary: In this patch, two new DB properties are defined: rocksdb.num-immutable-mem-table and rocksdb.num-entries-imm-mem-tables, from where number of entries in mem tables can be exposed to users
      
      Test Plan:
      Cover the codes in db_test
      make all check
      
      Reviewers: haobo, ljin, igor
      
      Reviewed By: igor
      
      CC: nkg-, igor, yhchiang, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18207
      a5707407
    • L
      get rid of shared_ptr in memtable.cc · 5f1daf7a
      Lei Jin 提交于
      Summary: Get rid of the devil. Probably won't impact anything on the perf side.
      
      Test Plan: make all check
      
      Reviewers: igor, haobo, sdong, yhchiang
      
      Reviewed By: haobo
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D18153
      5f1daf7a
    • S
      PlainTableReader to expose index size to users · 86a0133d
      sdong 提交于
      Summary:
      This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files.
      In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them.
      
      Test Plan:
      Add a unit test.
      make all check`
      
      Reviewers: haobo, ljin
      
      Reviewed By: haobo
      
      CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18195
      86a0133d
    • I
      Revert "Better port::Mutex::AssertHeld() and AssertNotHeld()" · 1068d2fa
      Igor Canadi 提交于
      This reverts commit ddafceb6.
      1068d2fa
    • I
      Better port::Mutex::AssertHeld() and AssertNotHeld() · ddafceb6
      Igor Canadi 提交于
      Summary:
      Using ThreadLocalPtr as a flag to determine if a mutex is locked or not enables us to implement AssertNotHeld(). It also makes AssertHeld() actually correct.
      
      I had to remove port::Mutex as a dependency for util/thread_local.h, but that's fine since we can just use std::mutex :)
      
      Test Plan: make check
      
      Reviewers: ljin, dhruba, haobo, sdong, yhchiang
      
      Reviewed By: ljin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18171
      ddafceb6
    • I
      Support for column families in TTL DB · 3992aec8
      Igor Canadi 提交于
      Summary:
      This will enable people using TTL DB to do so with multiple column families. They can also specify different TTLs for each one.
      
      TODO: Implement CreateColumnFamily() in TTL world.
      
      Test Plan: Added a very simple sanity test.
      
      Reviewers: dhruba, haobo, ljin, sdong, yhchiang
      
      Reviewed By: haobo
      
      CC: leveldb, alberts
      
      Differential Revision: https://reviews.facebook.net/D17859
      3992aec8
  10. 22 4月, 2014 4 次提交
    • I
      Rename "benchmark" back to "bench". · 8dc34364
      Igor Canadi 提交于
      Also, make `benchharness.cc` not compiled into rocksdb library.
      8dc34364
    • P
      Added benchmark functionality on the lines of folly/Benchmark.h · ff1b5df4
      Pratyush Seth 提交于
      Summary: Added benchmark functionality on the lines of folly/Benchmark.h
      
      Test Plan: Added unit tests
      
      Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17973
      ff1b5df4
    • I
      Remove TransactionLogIteratorRace when -DNDEBUG · f813279d
      Igor Canadi 提交于
      f813279d
    • L
      hints for narrowing down FindFile range and avoiding checking unrelevant L0 files · 0f2d7681
      Lei Jin 提交于
      Summary:
      The file tree structure in Version is prebuilt and the range of each file is known.
      On the Get() code path, we do binary search in FindFile() by comparing
      target key with each file's largest key and also check the range for each L0 file.
      With some pre-calculated knowledge, each key comparision that has been done can serve
      as a hint to narrow down further searches:
      (1) If a key falls within a L0 file's range, we can safely skip the next
      file if its range does not overlap with the current one.
      (2) If a key falls within a file's range in level L0 - Ln-1, we should only
      need to binary search in the next level for files that overlap with the current one.
      
      (1) will be able to skip some files depending one the key distribution.
      (2) can greatly reduce the range of binary search, especially for bottom
      levels, given that one file most likely only overlaps with N files from
      the level below (where N is max_bytes_for_level_multiplier). So on level
      L, we will only look at ~N files instead of N^L files.
      
      Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this
      improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s =
      9.6M/s), it gives us ~13% improvement.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, dhruba, sdong, yhchiang
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17205
      0f2d7681
  11. 18 4月, 2014 2 次提交
    • S
      Fix bugs introduced by D17961 · 65179225
      sdong 提交于
      Summary:
      D17961 has two bugs:
      (1) two level iterator fails to populate FileMetaData.table_reader, causing performance regression.
      (2) table cache handle the !status.ok() case in the wrong place, causing seg fault which shouldn't happen.
      
      Test Plan: make all check
      
      Reviewers: ljin, igor, haobo
      
      Reviewed By: ljin
      
      CC: yhchiang, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D17991
      65179225
    • S
      Minimize accessing multiple objects in Version::Get() · fa430bfd
      sdong 提交于
      Summary:
      One of our profilings shows that Version::Get() sometimes is slow when getting pointer of user comparators or other global objects. In this patch:
      (1) we keep pointers of immutable objects in Version to avoid accesses them though option objects or cfd objects
      (2) table_reader is directly cached in FileMetaData so that table cache don't have to go through handle first to fetch it
      (3) If level 0 has less than 3 files, skip the filtering logic based on SST tables' key range. Smallest and largest key are stored in separated memory locations, which has potential cache misses
      
      Test Plan: make all check
      
      Reviewers: haobo, ljin
      
      Reviewed By: haobo
      
      CC: igor, yhchiang, nkg-, leveldb
      
      Differential Revision: https://reviews.facebook.net/D17739
      fa430bfd
  12. 17 4月, 2014 2 次提交
  13. 16 4月, 2014 5 次提交