1. 09 9月, 2014 1 次提交
    • I
      Push- instead of pull-model for managing Write stalls · a2bb7c3c
      Igor Canadi 提交于
      Summary:
      Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
      * proceed with all writes without delay
      * delay all writes by fixed time
      * stop all writes
      
      The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
      
      When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
      
      This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
      
      Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
      
      Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22791
      a2bb7c3c
  2. 05 9月, 2014 2 次提交
  3. 16 8月, 2014 1 次提交
  4. 14 8月, 2014 1 次提交
  5. 12 8月, 2014 1 次提交
    • M
      Changes to support unity build: · 93e6b5e9
      miguelportilla 提交于
      * Script for building the unity.cc file via Makefile
      * Unity executable Makefile target for testing builds
      * Source code changes to fix compilation of unity build
      93e6b5e9
  6. 07 8月, 2014 1 次提交
    • S
      Add DB property "rocksdb.estimate-table-readers-mem" · 1242bfca
      sdong 提交于
      Summary:
      Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache.
      
      Refactor the property codes to allow getting property from a version, with DB mutex not acquired.
      
      Test Plan: Add several checks of this new property in existing codes for various cases.
      
      Reviewers: yhchiang, ljin
      
      Reviewed By: ljin
      
      Subscribers: xjin, igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D20733
      1242bfca
  7. 05 8月, 2014 1 次提交
  8. 31 7月, 2014 1 次提交
  9. 29 7月, 2014 2 次提交
    • S
      Add DB property estimated number of keys · f6784766
      sdong 提交于
      Summary: Add a DB property of estimated number of live keys, by adding number of entries of all mem tables and all files, subtracted by all deletions in all files.
      
      Test Plan: Add the case in unit tests
      
      Reviewers: hobbymanyp, ljin
      
      Reviewed By: ljin
      
      Subscribers: MarkCallaghan, yoshinorim, leveldb, igor, dhruba
      
      Differential Revision: https://reviews.facebook.net/D20631
      f6784766
    • L
      make statistics forward-able · 40fa8a4c
      Lei Jin 提交于
      Summary:
      Make StatisticsImpl being able to forward stats to provided statistics
      implementation. The main purpose is to allow us to collect internal
      stats in the future even when user supplies custom statistics
      implementation. It avoids intrumenting 2 sets of stats collection code.
      One immediate use case is tuning advisor, which needs to collect some
      internal stats, users may not be interested.
      
      Test Plan:
      ran db_bench and see stats show up at the end of run
      Will run make all check since some tests rely on statistics
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D20145
      40fa8a4c
  10. 25 7月, 2014 1 次提交
  11. 22 7月, 2014 1 次提交
  12. 18 7月, 2014 2 次提交
    • Y
      Add MaxInputLevel() to CompactionPicker · 052ddbe0
      Yueh-Hsuan Chiang 提交于
      Summary:
      Having if-then branch for different compaction strategies is considered
      hacky and make CompactionPicker less pluggable.  This diff removes two
      of such if-then branches in version_set.cc by adding MaxInputLevel() to
      CompactionPicker.
      
          // Given the current number of levels, returns the lowest allowed level
          // for compaction input.
          virtual int MaxInputLevel(int current_num_levels) const;
      
      Test Plan:
      make db_test
      export ROCKSDB_TESTS=Compaction
      ./db_test
      
      Reviewers: igor, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19971
      052ddbe0
    • R
      Guarding files_ attribute with #ifndef NDEBUG guard in FilePicker class. · 0d57e3ad
      Radheshyam Balasundaram 提交于
      Summary: Adding guards to files_ attribute of FilePicker class. This attribute is used only in DEBUG mode. This fixes build of static_lib in mac.
      
      Test Plan:
      make static_lib in mac
      make check all in devserver
      
      Reviewers: ljin, igor, sdong
      
      Reviewed By: sdong
      
      Differential Revision: https://reviews.facebook.net/D20163
      0d57e3ad
  13. 17 7月, 2014 3 次提交
    • Y
      Add struct CompactionInputFiles to manage compaction input files. · 296e3407
      Yueh-Hsuan Chiang 提交于
      Summary: Add struct CompactionInputFiles to manage compaction input files.
      
      Test Plan:
      export ROCKSDB_TESTS=Compact
      make db_test
      ./db_test
      
      Reviewers: ljin, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D20061
      296e3407
    • R
      Refactoring Version::Get() · 0418e66e
      Radheshyam Balasundaram 提交于
      Summary: Refactoring Version::Get() method to move file picker logic to a separate class.
      
      Test Plan: make check all
      
      Reviewers: igor, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19713
      0418e66e
    • F
      store file_indexer info in sequential memory · c11d604a
      Feng Zhu 提交于
      Summary:
        use arena to allocate space for next_level_index_ and level_rb_
        Thus increasing data locality and make Version::Get faster.
      
      Benchmark detail
      Base version: commit d2a727c1
      
      command used:
      ./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=2097152 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=fillseq, readrandom,readrandom,readrandom --use_existing_db=0 --num=52428800 --threads=1
      
      Result:
      cpu running percentage:
      Version::Get, improved from 7.98% to 7.42%
      FileIndexer::GetNextLevelIndex, improved from 1.18% to 0.68%.
      
      Test Plan:
        make all check
      
      Reviewers: ljin, haobo, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, igor
      
      Differential Revision: https://reviews.facebook.net/D19845
      c11d604a
  14. 16 7月, 2014 1 次提交
  15. 12 7月, 2014 1 次提交
    • F
      use FileLevel in LevelFileNumIterator · 178fd6f9
      Feng Zhu 提交于
      Summary:
        Use FileLevel in LevelFileNumIterator, thus use new version of findFile.
        Old version of findFile function is deleted.
        Write a function in version_set.cc to generate FileLevel from files_.
        Add GenerateFileLevelTest in version_set_test.cc
      
      Test Plan:
        make all check
      
      Reviewers: ljin, haobo, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: igor, dhruba
      
      Differential Revision: https://reviews.facebook.net/D19659
      178fd6f9
  16. 10 7月, 2014 2 次提交
    • F
      create compressed_levels_ in Version, allocate its space using arena. Make... · f697cad1
      Feng Zhu 提交于
      create compressed_levels_ in Version, allocate its space using arena. Make Version::Get, Version::FindFile faster
      
      Summary:
          Define CompressedFileMetaData that just contains fd, smallest_slice, largest_slice. Create compressed_levels_ in Version, the space is allocated using arena
          Thus increase the file meta data locality, speed up "Get" and "FindFile"
      
          benchmark with in-memory tmpfs, could have 4% improvement under "random read" and 2% improvement under "read while writing"
      
      benchmark command:
      ./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=readwhilewriting,readwhilewriting,readwhilewriting --use_existing_db=1 --num=52428800 --threads=1 —writes_per_second=81920
      
      Read Random:
      From 1.8363 ms/op, improve to 1.7587 ms/op.
      Read while writing:
      From 2.985 ms/op, improve to 2.924 ms/op.
      
      Test Plan:
          make all check
      
      Reviewers: ljin, haobo, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, igor
      
      Differential Revision: https://reviews.facebook.net/D19419
      f697cad1
    • Y
      Some fixes on size compensation logic for deletion entry in compaction · 70828557
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch include two fixes:
      1. newly created Version will now takes the aggregated stats for average-value-size from the latest Version.
      2. compensated size of a file is now computed only for newly created / loaded file, this addresses the issue where files are already sorted by their compensated file size but might sometimes observe some out-of-order due to later update on compensated file size.
      
      Test Plan:
      export ROCKSDB_TESTS=CompactionDele
      ./db_test
      
      Reviewers: ljin, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19557
      70828557
  17. 04 7月, 2014 1 次提交
  18. 03 7月, 2014 1 次提交
  19. 01 7月, 2014 1 次提交
    • I
      No need for files_by_size_ in universal compaction · a2e0d890
      Igor Canadi 提交于
      Summary: files_by_size_ is sorted by time in case of universal compaction. However, Version::files_ is also sorted by time. So no need for files_by_size_
      
      Test Plan:
      1) make check with the change
      2) make check with `assert(last_index == c->input_version_->files_[level].size() - 1);` in compaction picker
      
      Reviewers: dhruba, haobo, yhchiang, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19125
      a2e0d890
  20. 25 6月, 2014 1 次提交
    • Y
      Allow compaction to reclaim storage more effectively. · e813f5b6
      Yueh-Hsuan Chiang 提交于
      Summary:
      This diff allows compaction to reclaim storage more effectively.
      In the current design, compactions are mainly triggered based on
      the file sizes.  However, since deletion entries does not have
      value, files which have many deletion entries are less likely
      to be compacted.  As a result, it may took a while to make
      deletion entries to be compacted.
      
      This diff address issue by compensating the size of deletion
      entries during compaction process: the size of each deletion
      entry in the compaction process is augmented by 2x average
      value size.  The diff applies to both leveled and universal
      compacitons.
      
      Test Plan:
      develop CompactionDeletionTrigger
      make db_test
      ./db_test
      
      Reviewers: haobo, igor, ljin, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19029
      e813f5b6
  21. 20 6月, 2014 2 次提交
    • I
      Remove seek compaction · d4a84233
      Igor Canadi 提交于
      Summary:
      As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases.
      
      This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction.
      
      There is one test case that relied on didIO information. I'll try to find another way to implement it.
      
      Test Plan: make check
      
      Reviewers: sdong, haobo, yhchiang, ljin, dhruba
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19161
      d4a84233
    • I
      Use same sorting for all level 0 files · 107e08ba
      Igor Canadi 提交于
      Summary:
      We decided that one of the long term goals is to unify level and universal compaction.
      
      As a small first step, I'm unifying level 0 sorting methods.
      
      Previously, we used to sort level 0 files in level compaction by file number and in universal compaction by sequence number.
      
      But it turns out that in level compaction, sorting by file number is exactly the same as sorting by sequence number.
      
      Test Plan:
      Ran make check with bunch of asserts to verify the sorting order is exactly the same.
      Also, make check with this patch
      
      Reviewers: haobo, yhchiang, ljin, dhruba, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19131
      107e08ba
  22. 17 6月, 2014 1 次提交
    • S
      Refactor: group metadata needed to open an SST file to a separate copyable struct · cadc1adf
      sdong 提交于
      Summary:
      We added multiple fields to FileMetaData recently and are planning to add more.
      This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements:
      
      (1) use it to design a more efficient data structure to speed up read queries.
      (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data.
      
      The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D18933
      cadc1adf
  23. 14 6月, 2014 1 次提交
  24. 13 6月, 2014 1 次提交
  25. 03 6月, 2014 1 次提交
    • S
      In DB::NewIterator(), try to allocate the whole iterator tree in an arena · df9069d2
      sdong 提交于
      Summary:
      In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
      1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
      2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
      3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
      
      Limitations:
      (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
      (2) Two level iterator itself is allocated in arena, but not iterators inside it.
      
      Test Plan: make all check
      
      Reviewers: ljin, haobo
      
      Reviewed By: haobo
      
      Subscribers: leveldb, dhruba, yhchiang, igor
      
      Differential Revision: https://reviews.facebook.net/D18513
      df9069d2
  26. 22 5月, 2014 1 次提交
    • I
      FIFO compaction style · 6de6a066
      Igor Canadi 提交于
      Summary:
      Introducing new compaction style -- FIFO.
      
      FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.
      
      FIFO compaction style is suited for storing high-frequency event logs.
      
      Test Plan: Added a unit test
      
      Reviewers: dhruba, haobo, sdong
      
      Reviewed By: dhruba
      
      Subscribers: alberts, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18765
      6de6a066
  27. 15 5月, 2014 1 次提交
    • I
      Clean up compaction logging · f4574449
      Igor Canadi 提交于
      Summary: Cleaned up compaction logging a little bit. Now file sizes are easier to read. Also, removed the trailing space.
      
      Test Plan:
      verified that i'm happy with logging output:
      
              files_size[#33(seq=101,sz=98KB,0) #31(seq=81,sz=159KB,0) #26(seq=0,sz=637KB,0)]
      
      Reviewers: sdong, haobo, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18549
      f4574449
  28. 07 5月, 2014 1 次提交
    • S
      fsync directory after creating current file in NewDB() · 9efbd85a
      sdong 提交于
      Summary: One of our users reported current file corruption. The machine was rebooted during the time. This is the only think I can think of which could cause current file corruption. Just add this paranoid check.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor
      
      Reviewed By: haobo
      
      CC: yhchiang, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18495
      9efbd85a
  29. 01 5月, 2014 1 次提交
  30. 26 4月, 2014 3 次提交
  31. 25 4月, 2014 1 次提交
    • I
      Column family logging · ad3cd39c
      Igor Canadi 提交于
      Summary:
      Now that we have column families involved, we need to add extra context to every log message. They now start with "[column family name] log message"
      
      Also added some logging that I think would be useful, like level summary after every flush (I often needed that when going through the logs).
      
      Test Plan: make check + ran db_bench to confirm I'm happy with log output
      
      Reviewers: dhruba, haobo, ljin, yhchiang, sdong
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18303
      ad3cd39c