1. 05 9月, 2014 3 次提交
    • L
      fix more compile warnings · bb6ae0f8
      liuhuahang 提交于
      N/A
      
      Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f
      Signed-off-by: Nliuhuahang <liuhuahang@zerus.co>
      bb6ae0f8
    • S
      Remove path with arena==nullptr from NewInternalIterator · 45a5e3ed
      Stanislau Hlebik 提交于
      Summary:
      Simply code by removing code path which does not use Arena
      from NewInternalIterator
      
      Test Plan:
      make all check
      make valgrind_check
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22395
      45a5e3ed
    • L
      introduce ImmutableOptions · 5665e5e2
      Lei Jin 提交于
      Summary:
      As a preparation to support updating some options dynamically, I'd like
      to first introduce ImmutableOptions, which is a subset of Options that
      cannot be changed during the course of a DB lifetime without restart.
      
      ColumnFamily will keep both Options and ImmutableOptions. Any component
      below ColumnFamily should only take ImmutableOptions in their
      constructor. Other options should be taken from APIs, which will be
      allowed to adjust dynamically.
      
      I am yet to make changes to memtable and other related classes to take
      ImmutableOptions in their ctor. That can be done in a seprate diff as
      this one is already pretty big.
      
      Test Plan: make all check
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D22545
      5665e5e2
  2. 26 8月, 2014 2 次提交
  3. 14 8月, 2014 1 次提交
  4. 16 7月, 2014 1 次提交
  5. 03 7月, 2014 1 次提交
  6. 28 6月, 2014 1 次提交
  7. 17 6月, 2014 1 次提交
    • S
      Refactor: group metadata needed to open an SST file to a separate copyable struct · cadc1adf
      sdong 提交于
      Summary:
      We added multiple fields to FileMetaData recently and are planning to add more.
      This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements:
      
      (1) use it to design a more efficient data structure to speed up read queries.
      (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data.
      
      The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D18933
      cadc1adf
  8. 07 5月, 2014 1 次提交
    • S
      fsync directory after creating current file in NewDB() · 9efbd85a
      sdong 提交于
      Summary: One of our users reported current file corruption. The machine was rebooted during the time. This is the only think I can think of which could cause current file corruption. Just add this paranoid check.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor
      
      Reviewed By: haobo
      
      CC: yhchiang, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18495
      9efbd85a
  9. 26 4月, 2014 1 次提交
  10. 16 4月, 2014 1 次提交
    • I
      RocksDBLite · 588bca20
      Igor Canadi 提交于
      Summary:
      Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile.
      
      Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping)
      
      Test Plan: compiles :)
      
      Reviewers: dhruba, haobo, ljin, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17835
      588bca20
  11. 18 3月, 2014 1 次提交
    • I
      Optimize fallocation · f26cb0f0
      Igor Canadi 提交于
      Summary:
      Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads.
      
      This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x.
      
      At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported.
      
      Test Plan: make check
      
      Reviewers: dhruba, sdong, haobo, ljin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16761
      f26cb0f0
  12. 13 3月, 2014 1 次提交
    • I
      [CF] Code cleanup part 1 · fb2346fc
      Igor Canadi 提交于
      Summary:
      I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup.
      
      Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch.
      
      This diff also fixes race condition when dropping column family.
      
      Test Plan: Ran db_stress with variety of parameters
      
      Reviewers: dhruba, haobo
      
      Differential Revision: https://reviews.facebook.net/D16833
      fb2346fc
  13. 01 3月, 2014 1 次提交
    • I
      Make Log::Reader more robust · 58ca641d
      Igor Canadi 提交于
      Summary:
      This diff does two things:
      (1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#
      (2) Turn off mmap writes for all writes to log and manifest files
      
      (2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing.
      
      Test Plan:
      Added unit tests from LevelDB
      Actually recovered a "corrupted" MANIFEST file.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16119
      58ca641d
  14. 13 2月, 2014 1 次提交
    • L
      IOError cleanup · 994c327b
      Lei Jin 提交于
      Summary: Clean up IOErrors so that it only indicates errors talking to device.
      
      Test Plan: make all check
      
      Reviewers: igor, haobo, dhruba, emayanke
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15831
      994c327b
  15. 07 2月, 2014 1 次提交
    • I
      [CF] Propagate correct options to WriteBatch::InsertInto · 8fa8a708
      Igor Canadi 提交于
      Summary:
      WriteBatch can have multiple column families in one batch. Every column family has different options. So we have to add a way for write batch to get options for an arbitrary column family.
      
      This required a bit more acrobatics since lots of interfaces had to be changed.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15957
      8fa8a708
  16. 06 2月, 2014 2 次提交
    • I
      [CF] Add full_options_ to ColumnFamilyData · 6e56ab57
      Igor Canadi 提交于
      Summary:
      Lots of code expects Options on construction/function call. My original idea was to split Options argument into ColumnFamilyOptions and DBOptions (the latter only if needed). However, this will require huge code changes very deep in the stack.
      
      The better idea is to have ColumnFamilyData hold both ColumnFamilyOptions and Options. ColumnFamilyData::Options would be constructed from DBOptions (same for each column family) and ColumnFamilyOptions (different for each column family)
      
      Now when we construct a class or call any method that requires Options, we can just push him ColumnFamilyData::Options and be sure that it's using column-family-specific settings.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15927
      6e56ab57
    • I
      [CF] Rethink table cache · c24d8c4e
      Igor Canadi 提交于
      Summary:
      Adapting table cache to column families is interesting. We want table cache to be global LRU, so if some column families are use not as often as others, we want them to be evicted from cache. However, current TableCache object also constructs tables on its own. If table is not found in the cache, TableCache automatically creates new table. We want each column family to be able to specify different table factory.
      
      To solve the problem, we still have a single LRU, but we provide the LRUCache object to TableCache on construction. We have one TableCache per column family, but the underyling cache is shared by all TableCache objects.
      
      This allows us to have a global LRU, but still be able to support different table factories for different column families. Also, in the future it will also be able to support different directories for different column families.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15915
      c24d8c4e
  17. 04 2月, 2014 1 次提交
  18. 15 1月, 2014 1 次提交
    • I
      VersionEdit not to take NumLevels() · 055e6df4
      Igor Canadi 提交于
      Summary:
      I will submit a sequence of diffs that are preparing master branch for column families. There are a lot of implicit assumptions in the code that are making column family implementation hard. If I make the change only in column family branch, it will make merging back to master impossible.
      
      Most of the diffs will be simple code refactorings, so I hope we can have fast turnaround time. Feel free to grab me in person to discuss any of them.
      
      This diff removes number of level check from VersionEdit. It is used only when VersionEdit is read, not written, but has to be set when it is written. I believe it is a right thing to make VersionEdit dumb and check consistency on the caller side. This will also make it much easier to implement Column Families, since different column families can have different number of levels.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15159
      055e6df4
  19. 11 1月, 2014 1 次提交
    • S
      [Performance Branch] If options.max_open_files set to be -1, cache table... · aa0ef660
      Siying Dong 提交于
      [Performance Branch] If options.max_open_files set to be -1, cache table readers in FileMetadata for Get() and NewIterator()
      
      Summary:
      In some use cases, table readers for all live files should always be cached. In that case, there will be an opportunity to avoid the table cache look-up while Get() and NewIterator().
      
      We define options.max_open_files = -1 to be the mode that table readers for live files will always be kept. In that mode, table readers are cached in FileMetaData (with a reference count hold in table cache). So that when executing table_cache.Get() and table_cache.newInterator(), LRU cache checking can be by-passed, to reduce latency.
      
      Test Plan: add a test case in db_test
      
      Reviewers: haobo, kailiu
      
      Reviewed By: haobo
      
      CC: dhruba, igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D15039
      aa0ef660
  20. 08 1月, 2014 1 次提交
    • M
      Don't always compress L0 files written by memtable flush · 50994bf6
      Mark Callaghan 提交于
      Summary:
      Code was always compressing L0 files written by a memtable flush
      when compression was enabled. Now this is done when
      min_level_to_compress=0 for leveled compaction and when
      universal_compaction_size_percent=-1 for universal compaction.
      
      Task ID: #3416472
      
      Blame Rev:
      
      Test Plan:
      ran db_bench with compression options
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba, igor, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14757
      50994bf6
  21. 04 12月, 2013 1 次提交
    • I
      Get rid of some shared_ptrs · 043fc14c
      Igor Canadi 提交于
      Summary:
      I went through all remaining shared_ptrs and removed the ones that I found not-necessary. Only GenerateCachePrefix() is called fairly often, so don't expect much perf wins.
      
      The ones that are left are accessed infrequently and I think we're fine with keeping them.
      
      Test Plan: make asan_check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14427
      043fc14c
  22. 29 11月, 2013 1 次提交
  23. 13 11月, 2013 1 次提交
    • K
      Fixing the warning messages captured under mac os # Consider using `git commit... · 21587760
      kailiu 提交于
      Fixing the warning messages captured under mac os # Consider using `git commit -m 'One line title' && arc diff`. # You will save time by running lint and unit in the background.
      
      Summary: The work to make sure mac os compiles rocksdb is not completed yet. But at least we can start cleaning some warnings captured only by g++ from mac os..
      
      Test Plan: ran make in mac os
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14049
      21587760
  24. 01 11月, 2013 1 次提交
    • N
      In-place updates for equal keys and similar sized values · fe250702
      Naman Gupta 提交于
      Summary:
      Currently for each put, a fresh memory is allocated, and a new entry is added to the memtable with a new sequence number irrespective of whether the key already exists in the memtable. This diff is an attempt to update the value inplace for existing keys. It currently handles a very simple case:
      1. Key already exists in the current memtable. Does not inplace update values in immutable memtable or snapshot
      2. Latest value type is a 'put' ie kTypeValue
      3. New value size is less than existing value, to avoid reallocating memory
      
      TODO: For a put of an existing key, deallocate memory take by values, for other value types till a kTypeValue is found, ie. remove kTypeMerge.
      TODO: Update the transaction log, to allow consistent reload of the memtable.
      
      Test Plan: Added a unit test verifying the inplace update. But some other unit tests broken due to invalid sequence number checks. WIll fix them next.
      
      Reviewers: xinyaohu, sumeet, haobo, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12423
      
      Automatic commit by arc
      fe250702
  25. 18 10月, 2013 1 次提交
    • S
      Universal Compaction to Have a Size Percentage Threshold To Decide Whether to Compress · 9edda370
      Siying Dong 提交于
      Summary:
      This patch adds a option for universal compaction to allow us to only compress output files if the files compacted previously did not yet reach a specified ratio, to save CPU costs in some cases.
      
      Compression is always skipped for flushing. This is because the size information is not easy to evaluate for flushing case. We can improve it later.
      
      Test Plan:
      add test
      DBTest.UniversalCompactionCompressRatio1 and DBTest.UniversalCompactionCompressRatio12
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13467
      9edda370
  26. 17 10月, 2013 1 次提交
  27. 06 10月, 2013 1 次提交
  28. 05 10月, 2013 1 次提交
  29. 16 9月, 2013 2 次提交
  30. 24 8月, 2013 1 次提交
  31. 24 7月, 2013 1 次提交
    • J
      Virtualize SkipList Interface · 52d7ecfc
      Jim Paton 提交于
      Summary: This diff virtualizes the skiplist interface so that users can provide their own implementation of a backing store for MemTables. Eventually, the backing store will be responsible for its own synchronization, allowing users (and us) to experiment with different lockless implementations.
      
      Test Plan:
      make clean
      make -j32 check
      ./db_stress
      
      Reviewers: dhruba, emayanke, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11739
      52d7ecfc
  32. 01 7月, 2013 2 次提交
    • D
      Reduce write amplification by merging files in L0 back into L0 · 47c4191f
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      47c4191f
    • D
      Reduce write amplification by merging files in L0 back into L0 · 554c06dd
      Dhruba Borthakur 提交于
      Summary:
      There is a new option called hybrid_mode which, when switched on,
      causes HBase style compactions.  Files from L0 are
      compacted back into L0. This meat of this compaction algorithm
      is in PickCompactionHybrid().
      
      All files reside in L0. That means all files have overlapping
      keys. Each file has a time-bound, i.e. each file contains a
      range of keys that were inserted around the same time. The
      start-seqno and the end-seqno refers to the timeframe when
      these keys were inserted.  Files that have contiguous seqno
      are compacted together into a larger file. All files are
      ordered from most recent to the oldest.
      
      The current compaction algorithm starts to look for
      candidate files starting from the most recent file. It continues to
      add more files to the same compaction run as long as the
      sum of the files chosen till now is smaller than the next
      candidate file size. This logic needs to be debated
      and validated.
      
      The above logic should reduce write amplification to a
      large extent... will publish numbers shortly.
      
      Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
      
      Differential Revision: https://reviews.facebook.net/D11289
      554c06dd
  33. 13 6月, 2013 1 次提交
    • H
      [RocksDB] cleanup EnvOptions · bdf10859
      Haobo Xu 提交于
      Summary:
      This diff simplifies EnvOptions by treating it as POD, similar to Options.
      - virtual functions are removed and member fields are accessed directly.
      - StorageOptions is removed.
      - Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
      - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11175
      bdf10859
  34. 21 3月, 2013 1 次提交
    • D
      Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. · ad96563b
      Dhruba Borthakur 提交于
      Summary:
      This patch allows an application to specify whether to use bufferedio,
      reads-via-mmaps and writes-via-mmaps per database. Earlier, there
      was a global static variable that was used to configure this functionality.
      
      The default setting remains the same (and is backward compatible):
       1. use bufferedio
       2. do not use mmaps for reads
       3. use mmap for writes
       4. use readaheads for reads needed for compaction
      
      I also added a parameter to db_bench to be able to explicitly specify
      whether to do readaheads for compactions or not.
      
      Test Plan: make check
      
      Reviewers: sheki, heyongqiang, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9429
      ad96563b