1. 11 7月, 2014 5 次提交
    • R
      C API: create missing cf's, cleanup · 1fc71a4b
      Reed Allman 提交于
      1fc71a4b
    • T
      ForwardIterator::status() checks all child iterators · 105c1e09
      Tomislav Novak 提交于
      Summary:
      Forward iterator only checked `status_` and `mutable_iter_->status()`, which is
      not sufficient. For example, when reading exclusively from cache
      (kBlockCacheTier), `mutable_iter_->status()` may return kOk (e.g. there's
      nothing in the memtable), but one of immutable iterators could be in
      kIncomplete. In this case, `ForwardIterator::status()` ought to return that
      status instead of kOk.
      
      This diff changes `status()` to also check `imm_iters_`, `l0_iters_`, and
      `level_iters_`.
      
      Test Plan:
        ROCKSDB_TESTS=TailingIteratorIncomplete ./db_test
      
      Reviewers: ljin, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D19581
      105c1e09
    • S
      Add a function to return current perf level · 36de0e53
      sdong 提交于
      Summary: Add a function to return the perf level. It is to allow a wrapper of DB to increase the perf level and restore the original perf level after finishing the function call.
      
      Test Plan: Add a verification in db_test
      
      Reviewers: yhchiang, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: xjin, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D19551
      36de0e53
    • S
      Removing NewTotalOrderPlainTableFactory · 30c81e77
      Stanislau Hlebik 提交于
      Summary:
      Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect.
      Total order mode indicator is prefix_extractor == nullptr,
      but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests
      in plain_table_db_tests is incorrect.
      
      Test Plan: make all check
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19587
      30c81e77
    • I
      JSON (Document) API sketch · f0a8be25
      Igor Canadi 提交于
      Summary:
      This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API.
      
      I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :)
      
      Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time.
      
      Test Plan: Added a simple unit test
      
      Reviewers: haobo, yhchiang, dhruba, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18747
      f0a8be25
  2. 10 7月, 2014 4 次提交
    • F
      change the init parameter for FileDescriptor · 222cf255
      Feng Zhu 提交于
      Summary:
        fix a bug in improve_file_key_search, change the parameter for FileDescriptor
      
      Test Plan:
        make all check
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Differential Revision: https://reviews.facebook.net/D19611
      222cf255
    • L
      disable rate limiter test · 8a7d1fe6
      Lei Jin 提交于
      Summary:
      The test is not stable because it relies on disk and only runs for a
      short period of time. So misisng a compaction/flush would greatly affect
      the rate. I am disabling it for now. What do you guys think?
      
      Test Plan: make
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19599
      8a7d1fe6
    • F
      create compressed_levels_ in Version, allocate its space using arena. Make... · f697cad1
      Feng Zhu 提交于
      create compressed_levels_ in Version, allocate its space using arena. Make Version::Get, Version::FindFile faster
      
      Summary:
          Define CompressedFileMetaData that just contains fd, smallest_slice, largest_slice. Create compressed_levels_ in Version, the space is allocated using arena
          Thus increase the file meta data locality, speed up "Get" and "FindFile"
      
          benchmark with in-memory tmpfs, could have 4% improvement under "random read" and 2% improvement under "read while writing"
      
      benchmark command:
      ./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=readwhilewriting,readwhilewriting,readwhilewriting --use_existing_db=1 --num=52428800 --threads=1 —writes_per_second=81920
      
      Read Random:
      From 1.8363 ms/op, improve to 1.7587 ms/op.
      Read while writing:
      From 2.985 ms/op, improve to 2.924 ms/op.
      
      Test Plan:
          make all check
      
      Reviewers: ljin, haobo, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, igor
      
      Differential Revision: https://reviews.facebook.net/D19419
      f697cad1
    • Y
      Some fixes on size compensation logic for deletion entry in compaction · 70828557
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch include two fixes:
      1. newly created Version will now takes the aggregated stats for average-value-size from the latest Version.
      2. compensated size of a file is now computed only for newly created / loaded file, this addresses the issue where files are already sorted by their compensated file size but might sometimes observe some out-of-order due to later update on compensated file size.
      
      Test Plan:
      export ROCKSDB_TESTS=CompactionDele
      ./db_test
      
      Reviewers: ljin, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19557
      70828557
  3. 09 7月, 2014 6 次提交
  4. 08 7月, 2014 7 次提交
    • R
      C API: bugfix column_family_comact_range · e9b18b6b
      Reed Allman 提交于
      e9b18b6b
    • I
      Fix compile issue · 4adf64e0
      Igor Canadi 提交于
      4adf64e0
    • I
      Fix valgrind error in c_test · 8a03935f
      Igor Canadi 提交于
      Summary:
      External contribution caused some valgrind errors: https://github.com/facebook/rocksdb/commit/1a34aaaef0900785c2de7e55b55d8c48d1201300
      
      This diff fixes them
      
      Test Plan: ran valgrind
      
      Reviewers: sdong, yhchiang, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19485
      8a03935f
    • E
      C API: Add test for compaction filter factories · 13a130cc
      Evan Shaw 提交于
      Also refactored the compaction filter tests to share some code and ensure that
      options were getting reset so future test results aren't confused.
      13a130cc
    • E
      C API: Allow setting compaction filter factory · 3f7104d7
      Evan Shaw 提交于
      3f7104d7
    • E
      C API: Add support for compaction filter factories (v1) · 91bede79
      Evan Shaw 提交于
      91bede79
    • R
      Adding NUMA support to db_bench tests · f0660d52
      Radheshyam Balasundaram 提交于
      Summary:
      Changes:
      - Adding numa_aware flag to db_bench.cc
      - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node
      Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op.
      
      Test Plan:
      Ran db_bench tests using following command:
      ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True]
      
      The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True.
      
      Reviewers: sdong, yhchiang, ljin, igor
      
      Reviewed By: ljin, igor
      
      Subscribers: igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D19353
      f0660d52
  5. 07 7月, 2014 1 次提交
  6. 06 7月, 2014 1 次提交
  7. 04 7月, 2014 5 次提交
    • Y
      Improve SimpleWriteTimeoutTest to avoid false alarm. · 7b85c1e9
      Yueh-Hsuan Chiang 提交于
      Summary:
      SimpleWriteTimeoutTest has two parts: 1) insert two large key/values
      to make memtable full and expect both of them are successful; 2) insert
      another key / value and expect it to be timed-out.  Previously we also
      set a timeout in the first step, but this might sometimes cause
      false alarm.
      
      This diff makes the first two writes run without timeout setting.
      
      Test Plan:
      export ROCKSDB_TESTS=Time
      make db_test
      
      Reviewers: sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19461
      7b85c1e9
    • Y
      Fixed a warning in release mode. · d33657a4
      Yueh-Hsuan Chiang 提交于
      Summary: Removed a variable that is only used in assertion check.
      
      Test Plan: make release
      
      Reviewers: ljin, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19455
      d33657a4
    • Y
      Finer report I/O stats about Flush and Compaction. · 90a6aca4
      Yueh-Hsuan Chiang 提交于
      Summary:
      This diff allows the I/O stats about Flush and Compaction to be reported
      in a more accurate way.  Instead of measuring the size of a file, it
      measure I/O cost in per read / write basis.
      
      Test Plan: make all check
      
      Reviewers: sdong, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19383
      90a6aca4
    • Y
      Add timeout_hint_us to WriteOptions and introduce Status::TimeOut. · d4d338de
      Yueh-Hsuan Chiang 提交于
      Summary:
      This diff adds timeout_hint_us to WriteOptions.  If it's non-zero, then
      1) writes associated with this options MAY be aborted when it has been
        waiting for longer than the specified time.  If an abortion happens,
        associated writes will return Status::TimeOut.
      2) the stall time of the associated write caused by flush or compaction
        will be limited by timeout_hint_us.
      
      The default value of timeout_hint_us is 0 (i.e., OFF.)
      
      The statistics of timeout writes will be recorded in WRITE_TIMEDOUT.
      
      Test Plan:
      export ROCKSDB_TESTS=WriteTimeoutAndDelayTest
      make db_test
      ./db_test
      
      Reviewers: igor, ljin, haobo, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18837
      d4d338de
    • I
      Fix mac os compile error · 4203431e
      Igor Canadi 提交于
      4203431e
  8. 03 7月, 2014 2 次提交
    • S
      Support Multiple DB paths (without having an interface to expose to users) · 2459f7ec
      sdong 提交于
      Summary:
      In this patch, we allow RocksDB to support multiple DB paths internally.
      No user interface is supported yet so this patch is silent to users.
      
      Test Plan: make all check
      
      Reviewers: igor, haobo, ljin, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D18921
      2459f7ec
    • I
      Centralize compression decision to compaction picker · f146cab2
      Igor Canadi 提交于
      Summary:
      Before this diff, we're deciding enable_compression in CompactionPicker and then we're deciding final compression type in DBImpl. This is kind of confusing.
      
      After the diff, the final compression type will be decided in CompactionPicker.
      
      The reason for this is that I want CompactFiles() to specify output compression type, so that people can mix and match compression styles in their compaction algorithms. This diff makes it much easier to do that.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, sdong, yhchiang, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19137
      f146cab2
  9. 02 7月, 2014 4 次提交
    • S
      Re-commit the correct part (WalDir) of the revision: · 1d050067
      sdong 提交于
      Commit 6634844d by sdong
      Two small fixes in db_test
      
      Summary:
      Two fixes:
      (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
      (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.
      
      Test Plan: ./db_test
      
      Reviewers: yhchiang, ljin
      
      Reviewed By: ljin
      
      Subscribers: nkg-, igor, dhruba, haobo, leveldb
      
      Differential Revision: https://reviews.facebook.net/D19389
      1d050067
    • S
      Revert "Two small fixes in db_test" · 30b20604
      sdong 提交于
      This reverts commit 6634844d.
      30b20604
    • S
      HashLinkList memtable switches a bucket to a skip list to reduce performance outliers · 9c332aa1
      sdong 提交于
      Summary:
      In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search.
      
      Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list.
      
      The new data structure is documented in comments in the codes.
      
      Test Plan:
      make all check
      set threshold_use_skiplist in several tests
      
      Reviewers: yhchiang, haobo, ljin
      
      Reviewed By: yhchiang, ljin
      
      Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D19299
      9c332aa1
    • S
      Two small fixes in db_test · 6634844d
      sdong 提交于
      Summary:
      Two fixes:
      (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
      (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.
      
      Test Plan: ./db_test
      
      Reviewers: yhchiang, ljin
      
      Reviewed By: ljin
      
      Subscribers: nkg-, igor, dhruba, haobo, leveldb
      
      Differential Revision: https://reviews.facebook.net/D19389
      6634844d
  10. 01 7月, 2014 3 次提交
    • I
      Fix compile error · f5d4df1c
      Igor Canadi 提交于
      f5d4df1c
    • I
      No need for files_by_size_ in universal compaction · a2e0d890
      Igor Canadi 提交于
      Summary: files_by_size_ is sorted by time in case of universal compaction. However, Version::files_ is also sorted by time. So no need for files_by_size_
      
      Test Plan:
      1) make check with the change
      2) make check with `assert(last_index == c->input_version_->files_[level].size() - 1);` in compaction picker
      
      Reviewers: dhruba, haobo, yhchiang, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19125
      a2e0d890
    • F
      use arena to allocate memtable's bloomfilter and hashskiplist's buckets_ · 56563674
      Feng Zhu 提交于
      Summary:
          Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena
          DynamicBloom: pass arena via constructor, allocate space in SetTotalBits
          HashSkipListRep: allocate space of buckets_ using arena.
             do not delete it in deconstructor because arena would take care of it.
          Several test files are changed.
      
      Test Plan:
          make all check
      
      Reviewers: ljin, haobo, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: igor, dhruba
      
      Differential Revision: https://reviews.facebook.net/D19335
      56563674
  11. 28 6月, 2014 1 次提交
  12. 27 6月, 2014 1 次提交
    • S
      Cache some conditions for DBImpl::MakeRoomForWrite · a3594867
      Stanislau Hlebik 提交于
      Summary:
      Task 4580155. Some conditions in DBImpl::MakeRoomForWrite can be cached in
      ColumnFamilyData, because theirs value can be changed only during compaction,
      adding new memtable and/or add recalculation of compaction score.
      
      These conditions are:
      
      cfd->imm()->size() ==  cfd->options()->max_write_buffer_number - 1
      cfd->current()->NumLevelFiles(0) >=  cfd->options()->level0_stop_writes_trigger
      cfd->options()->soft_rate_limit > 0.0 &&
          (score = cfd->current()->MaxCompactionScore()) >  cfd->options()->soft_rate_limit
      cfd->options()->hard_rate_limit > 1.0 &&
          (score = cfd->current()->MaxCompactionScore()) >  cfd->options()->hard_rate_limit
      
      P.S.
      As it's my first diff, Siying suggested to add everybody as a reviewers
      for this diff. Sorry, if I forgot someone or add someone by mistake.
      
      Test Plan: make all check
      
      Reviewers: haobo, xjin, dhruba, yhchiang, zagfox, ljin, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D19311
      a3594867