1. 20 6月, 2014 2 次提交
  2. 17 6月, 2014 1 次提交
    • S
      Refactor: group metadata needed to open an SST file to a separate copyable struct · cadc1adf
      sdong 提交于
      Summary:
      We added multiple fields to FileMetaData recently and are planning to add more.
      This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements:
      
      (1) use it to design a more efficient data structure to speed up read queries.
      (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data.
      
      The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D18933
      cadc1adf
  3. 07 6月, 2014 2 次提交
    • I
      Create Missing Column Families · a0191c9d
      Igor Canadi 提交于
      Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490
      
      Test Plan: added unit test. also, stress test for some time
      
      Reviewers: sdong, haobo, dhruba, ljin, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18951
      a0191c9d
    • I
      Write Fast-path for single column family · 99d3eed2
      Igor Canadi 提交于
      Summary: We have a perf regression of Write() even with one column family. Make fast path for single column family to avoid the perf regression. See task #4455480
      
      Test Plan: make check
      
      Reviewers: sdong, ljin
      
      Reviewed By: sdong, ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18963
      99d3eed2
  4. 06 6月, 2014 1 次提交
  5. 04 6月, 2014 1 次提交
  6. 03 6月, 2014 3 次提交
    • S
      In DB::NewIterator(), try to allocate the whole iterator tree in an arena · df9069d2
      sdong 提交于
      Summary:
      In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
      1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
      2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
      3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
      
      Limitations:
      (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
      (2) Two level iterator itself is allocated in arena, but not iterators inside it.
      
      Test Plan: make all check
      
      Reviewers: ljin, haobo
      
      Reviewed By: haobo
      
      Subscribers: leveldb, dhruba, yhchiang, igor
      
      Differential Revision: https://reviews.facebook.net/D18513
      df9069d2
    • I
      Only signal cond variable if need to · 91ddd587
      Igor Canadi 提交于
      Summary:
      At the end of BackgroundCallCompaction(), we call SignalAll(), even though we don't need to. If compaction hasn't done anything and there's another compaction running, there is no need to signal on the condition variable. Doing so creates a tight feedback loop which results in log files like:
      
         wait for memtable flush
         compaction nothing to do
         wait for memtable flush
         compaction nothing to do
      
      This change eliminates that
      
      Test Plan:
      make check
      Also:
      
          icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
          7435
          icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
          372
      
      First version is before the change, second version is after the change.
      
      Reviewers: dhruba, ljin, haobo, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18855
      91ddd587
    • I
      Flush stale column families less aggressively · 8cb7ad83
      Igor Canadi 提交于
      Summary:
      We've seen some production issues where column family is detected as stale, although there is only one column family in the system. This is a quick fix that:
      1) doesn't flush stale column families if there's only one of them
      2) Use 4 as a coefficient instead of 2 for determening when a column family is stale. This will make flushing less aggressive, while still keep a nice dynamic flushing of very stale CFs.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, ljin, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18861
      8cb7ad83
  7. 31 5月, 2014 1 次提交
    • L
      forward iterator · 388d2054
      Lei Jin 提交于
      Summary:
      Forward iterator puts everything together in a flat structure instead of
      a hierarchy of nested iterators. this should simplify the code and
      provide better performance. It also enables more optimization since all
      information are accessiable in one place.
      Init evaluation shows about 6% improvement
      
      Test Plan: db_test and db_bench
      
      Reviewers: dhruba, igor, tnovak, sdong, haobo
      
      Reviewed By: haobo
      
      Subscribers: sdong, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18795
      388d2054
  8. 22 5月, 2014 1 次提交
    • I
      FIFO compaction style · 6de6a066
      Igor Canadi 提交于
      Summary:
      Introducing new compaction style -- FIFO.
      
      FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.
      
      FIFO compaction style is suited for storing high-frequency event logs.
      
      Test Plan: Added a unit test
      
      Reviewers: dhruba, haobo, sdong
      
      Reviewed By: dhruba
      
      Subscribers: alberts, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18765
      6de6a066
  9. 13 5月, 2014 1 次提交
  10. 07 5月, 2014 1 次提交
    • S
      fsync directory after creating current file in NewDB() · 9efbd85a
      sdong 提交于
      Summary: One of our users reported current file corruption. The machine was rebooted during the time. This is the only think I can think of which could cause current file corruption. Just add this paranoid check.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor
      
      Reviewed By: haobo
      
      CC: yhchiang, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18495
      9efbd85a
  11. 01 5月, 2014 2 次提交
    • I
      Fix signed/unsigned compare · 16f1aa7b
      Igor Canadi 提交于
      16f1aa7b
    • I
      Flush stale column families · df700476
      Igor Canadi 提交于
      Summary:
      Added a new option `max_total_wal_size`. Once the total WAL size goes over that, we make an attempt to flush all column families that still have data in the earliest WAL file.
      
      By default, I calculate `max_total_wal_size` dynamically, that should be good-enough for non-advanced customers.
      
      Test Plan: Added a test
      
      Reviewers: dhruba, haobo, sdong, ljin, yhchiang
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18345
      df700476
  12. 30 4月, 2014 2 次提交
    • Y
      Add a new mem-table representation based on cuckoo hash. · 9d9d2965
      Yueh-Hsuan Chiang 提交于
      Summary:
      = Major Changes =
      * Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash.
        Cuckoo hash uses multiple hash functions.  This allows each key to have multiple
        possible locations in the mem-table.
      
        - Put: When insert a key, it will try to find whether one of its possible
          locations is vacant and store the key.  If none of its possible
          locations are available, then it will kick out a victim key and
          store at that location.  The kicked-out victim key will then be
          stored at a vacant space of its possible locations or kick-out
          another victim.  In this diff, the kick-out path (known as
          cuckoo-path) is found using BFS, which guarantees to be the shortest.
      
       - Get: Simply tries all possible locations of a key --- this guarantees
         worst-case constant time complexity.
      
       - Time complexity: O(1) for Get, and average O(1) for Put if the
         fullness of the mem-table is below 80%.
      
       - Default using two hash functions, the number of hash functions used
         by the cuckoo-hash may dynamically increase if it fails to find a
         short-enough kick-out path.
      
       - Currently, HashCuckooRep does not support iteration and snapshots,
         as our current main purpose of this is to optimize point access.
      
      = Minor Changes =
      * Add IsSnapshotSupported() to DB to indicate whether the current DB
        supports snapshots.  If it returns false, then DB::GetSnapshot() will
        always return nullptr.
      
      Test Plan:
      Run existing tests.  Will develop a test specifically for cuckoo hash in
      the next diff.
      
      Reviewers: sdong, haobo
      
      Reviewed By: sdong
      
      CC: leveldb, dhruba, igor
      
      Differential Revision: https://reviews.facebook.net/D16155
      9d9d2965
    • I
      Cache result of ReadFirstRecord() · dd9eb7a7
      Igor Canadi 提交于
      Summary:
      ReadFirstRecord() reads the actual log file from disk on every call. This diff introduces a cache layer on top of ReadFirstRecord(), which should significantly speed up repeated calls to GetUpdatesSince().
      
      I also cleaned up some stuff, but the whole TransactionLogIterator could use some refactoring, especially if we see increased usage.
      
      Test Plan: make check
      
      Reviewers: haobo, sdong, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18387
      dd9eb7a7
  13. 26 4月, 2014 2 次提交
  14. 25 4月, 2014 1 次提交
    • I
      Column family logging · ad3cd39c
      Igor Canadi 提交于
      Summary:
      Now that we have column families involved, we need to add extra context to every log message. They now start with "[column family name] log message"
      
      Also added some logging that I think would be useful, like level summary after every flush (I often needed that when going through the logs).
      
      Test Plan: make check + ran db_bench to confirm I'm happy with log output
      
      Reviewers: dhruba, haobo, ljin, yhchiang, sdong
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18303
      ad3cd39c
  15. 18 4月, 2014 1 次提交
    • S
      Minimize accessing multiple objects in Version::Get() · fa430bfd
      sdong 提交于
      Summary:
      One of our profilings shows that Version::Get() sometimes is slow when getting pointer of user comparators or other global objects. In this patch:
      (1) we keep pointers of immutable objects in Version to avoid accesses them though option objects or cfd objects
      (2) table_reader is directly cached in FileMetaData so that table cache don't have to go through handle first to fetch it
      (3) If level 0 has less than 3 files, skip the filtering logic based on SST tables' key range. Smallest and largest key are stored in separated memory locations, which has potential cache misses
      
      Test Plan: make all check
      
      Reviewers: haobo, ljin
      
      Reviewed By: haobo
      
      CC: igor, yhchiang, nkg-, leveldb
      
      Differential Revision: https://reviews.facebook.net/D17739
      fa430bfd
  16. 16 4月, 2014 6 次提交
    • I
      Fix Mac OS compile · 1803ed2c
      Igor Canadi 提交于
      1803ed2c
    • S
      When creating a new DB, fail it when wal_dir contains existing log files · 0f40fe4b
      sdong 提交于
      Summary: Current behavior of creating new DB is, if there is existing log files, we will go ahead and replay them on top of empty DB. This is a behavior that no user would expect. With this patch, we will fail the creation if a user creates a DB with existing log files.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: haobo
      
      CC: nkg-, yhchiang, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D17817
      0f40fe4b
    • I
      Fix compile issues introduced by RocksDBLite · c1666158
      Igor Canadi 提交于
      c1666158
    • I
      RocksDBLite · 588bca20
      Igor Canadi 提交于
      Summary:
      Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile.
      
      Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping)
      
      Test Plan: compiles :)
      
      Reviewers: dhruba, haobo, ljin, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17835
      588bca20
    • I
      dbe0f327
    • I
      Don't roll empty logs · e6acb874
      Igor Canadi 提交于
      Summary:
      With multiple column families, especially when manual Flush is executed, we might roll the log file, although the current log file is empty (no data has been written to the log).
      
      After the diff, we won't create new log file if current is empty.
      
      Next, I will write an algorithm that will flush column families that reference old log files (i.e., that weren't flushed in a while)
      
      Test Plan: Added an unit test. Confirmed that unit test failes in master
      
      Reviewers: dhruba, haobo, ljin, sdong
      
      Reviewed By: ljin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17631
      e6acb874
  17. 15 4月, 2014 2 次提交
    • L
      thread local for tailing iterator · 82b37a18
      Lei Jin 提交于
      Summary:
      replace the super version acquisision in tailing itrator with thread
      local
      
      Test Plan: will post results
      
      Reviewers: igor, haobo, sdong, yhchiang, dhruba
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17757
      82b37a18
    • L
      using thread local SuperVersion for NewIterator · 539dd207
      Lei Jin 提交于
      Summary:
      Similar to GetImp(), use SuperVersion from thread local instead of acquriing mutex.
      I don't expect this change will make a dent on NewIterator() performance
      because the bottleneck seems to be on the rest part of the API
      
      Test Plan:
      make asan_check
      will post perf numbers
      
      Reviewers: haobo, igor, sdong, dhruba, yhchiang
      
      Reviewed By: sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17643
      539dd207
  18. 12 4月, 2014 1 次提交
  19. 11 4月, 2014 1 次提交
  20. 10 4月, 2014 2 次提交
    • S
      Polish IterKey and use it in DBImpl::ProcessKeyValueCompaction() · df2a8b6a
      sdong 提交于
      Summary:
      1. Polish IterKey a little bit.
      2. Turn to use it in local parameter of current_user_key in DBImpl::ProcessKeyValueCompaction(). Our profile showing that DBImpl::ProcessKeyValueCompaction() has about 14% costs in std::string (the base including reading and writing data but excluding compaction filtering), which is higher than it should be. There are two std::string used in DBImpl::ProcessKeyValueCompaction(), compaction_filter_value and current_user_key and it's hard to distinguish the two.
      
      Test Plan: make all check
      
      Reviewers: haobo, ljin
      
      Reviewed By: haobo
      
      CC: igor, yhchiang, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D17613
      df2a8b6a
    • I
      Column family support for DB::OpenForReadOnly() · b947fdc8
      Igor Canadi 提交于
      Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though)
      
      Test Plan: added a unit test in column_family_test
      
      Reviewers: haobo, sdong, ljin, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17565
      b947fdc8
  21. 09 4月, 2014 3 次提交
  22. 08 4月, 2014 3 次提交