1. 28 10月, 2016 1 次提交
  2. 01 10月, 2016 1 次提交
    • A
      Store range tombstones in memtable · 6009c473
      Andrew Kryczka 提交于
      Summary:
      - Store range tombstones in a separate MemTableRep instantiated with ColumnFamilyOptions::memtable_factory
      - MemTable::NewRangeTombstoneIterator() returns a MemTableIterator over the separate MemTableRep
      - Part of the read path is not implemented yet (i.e., MemTable::Get())
      
      Test Plan: see unit tests
      
      Reviewers: wanning
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D62217
      6009c473
  3. 28 9月, 2016 1 次提交
    • A
      Add SeekForPrev() to Iterator · f517d9dd
      Aaron Gao 提交于
      Summary:
      Add new Iterator API, `SeekForPrev`: find the last key that <= target key
      support prefix_extractor
      support prefix_same_as_start
      support upper_bound
      not supported in iterators without Prev()
      
      Also add tests in db_iter_test and db_iterator_test
      
      Pass all tests
      Cheers!
      
      Test Plan: make all check -j64
      
      Reviewers: andrewkr, yiwu, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64149
      f517d9dd
  4. 27 7月, 2016 1 次提交
    • S
      Change options memtable_prefix_bloom_huge_page_tlb_size =>... · e5b5f12b
      sdong 提交于
      Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too
      
      Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.
      
      Test Plan: Run all existing tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60513
      e5b5f12b
  5. 21 7月, 2016 1 次提交
    • I
      Introduce FullMergeV2 (eliminate memcpy from merge operators) · 68a8e6b8
      Islam AbdelRahman 提交于
      Summary:
      This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>
      
      This diff is stacked on top of D56493 and D56511
      
      In this diff we
      - Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
      - Replace std::deque<std::string> with std::vector<Slice> to pass operands
      - Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
      - Allow FullMergeV2 output to be an existing operand
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
      readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
      readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
      readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
      readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s
      
      [master]
      readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
      readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
      readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
      readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
      readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
      ```
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
      readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
      readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
      readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
      readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s
      
      [master]
      readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
      readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
      readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
      readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
      readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s
      
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]
      
      [FullMergeV2]
      $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
      readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
      readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
      readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
      readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s
      
      [master]
      readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
      readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
      readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
      readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
      readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      
      [FullMergeV2]
      readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
      readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
      readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
      readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
      readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s
      
      [master]
      readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
      readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
      readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
      readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
      readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
      ```
      
      Test Plan: COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: lovro, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57075
      68a8e6b8
  6. 09 7月, 2016 1 次提交
    • S
      Concurrent memtable inserter to update counters and flush state after all inserts · 907f24d0
      sdong 提交于
      Summary: In concurrent memtable insert case, updating counters in MemTable::Add() can count for 5% CPU usage. By batch all the counters and update in the end of the write batch, the CPU overheads are overhead in the use cases where more than one key is updated in one write batch.
      
      Test Plan:
      Write throughput increases 12% with this benchmark setting:
      
      TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom -disable_auto_compactions -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -num=10000000 --writes=1000000 -max_background_flushes=16 -max_write_buffer_number=16 --threads=64 --batch_size=128   -allow_concurrent_memtable_write -enable_write_thread_adaptive_yield
      
      Reviewers: andrewkr, IslamAbdelRahman, ngbronson, igor
      
      Reviewed By: ngbronson
      
      Subscribers: ngbronson, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60495
      907f24d0
  7. 06 7月, 2016 1 次提交
  8. 18 6月, 2016 1 次提交
    • S
      Deprectate filter_deletes · 7b79238b
      sdong 提交于
      Summary: filter_deltes is not a frequently used feature. Remove it.
      
      Test Plan: Run all test suites.
      
      Reviewers: igor, yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59427
      7b79238b
  9. 14 6月, 2016 1 次提交
  10. 11 6月, 2016 1 次提交
    • S
      memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate... · 20699df8
      sdong 提交于
      memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes
      
      Summary:
      memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
      It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio
      
      Test Plan: Run all existing tests
      
      Reviewers: yhchiang, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59199
      20699df8
  11. 11 5月, 2016 1 次提交
    • R
      [rocksdb] Memtable Log Referencing and Prepared Batch Recovery · 1b8a2e8f
      Reid Horuff 提交于
      Summary:
      This diff is built on top of WriteBatch modification: https://reviews.facebook.net/D54093 and adds the required functionality to rocksdb core necessary for rocksdb to support 2PC.
      
      modfication of DBImpl::WriteImpl()
      - added two arguments *uint64_t log_used = nullptr, uint64_t log_ref = 0;
      - *log_used is an output argument which will return the log number which the incoming batch was inserted into, 0 if no WAL insert took place.
      -  log_ref is a supplied log_number which all memtables inserted into will reference after the batch insert takes place. This number will reside in 'FindMinPrepLogReferencedByMemTable()' until all Memtables insertinto have flushed.
      
      - Recovery/writepath is now aware of prepared batches and commit and rollback markers.
      
      Test Plan: There is currently no test on this diff. All testing of this functionality takes place in the Transaction layer/diff but I will add some testing.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Subscribers: leveldb, santoshb, andrewkr, vasilep, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D56919
      1b8a2e8f
  12. 27 4月, 2016 1 次提交
    • I
      Introduce PinnedIteratorsManager (Reduce PinData() overhead / Refactor PinData) · d719b095
      Islam AbdelRahman 提交于
      Summary:
      While trying to reuse PinData() / ReleasePinnedData() .. to optimize away some memcpys I realized that there is a significant overhead for using PinData() / ReleasePinnedData if they were called many times.
      This diff refactor the pinning logic by introducing PinnedIteratorsManager a centralized component that will be created once and will be notified whenever we need to Pin an Iterator. This implementation have much less overhead than the original implementation
      
      Test Plan:
      make check -j64
      COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: yhchiang, sdong, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56493
      d719b095
  13. 10 2月, 2016 1 次提交
  14. 07 1月, 2016 1 次提交
    • R
      Optimize GetLatestSequenceForKey · da032495
      Reid Horuff 提交于
      Summary: DBImpl::GetLatestSequenceForKey() can do memcpy's to load a value that will never be used.  This can be optimized by changing all the Get() functions called to optionally not fetch the value (and only fetch the sequencenumber).
      
      Test Plan: optimistic_transaction_test and transaction_test
      
      Reviewers: anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D52227
      da032495
  15. 26 12月, 2015 1 次提交
    • N
      support for concurrent adds to memtable · 7d87f027
      Nathan Bronson 提交于
      Summary:
      This diff adds support for concurrent adds to the skiplist memtable
      implementations.  Memory allocation is made thread-safe by the addition of
      a spinlock, with small per-core buffers to avoid contention.  Concurrent
      memtable writes are made via an additional method and don't impose a
      performance overhead on the non-concurrent case, so parallelism can be
      selected on a per-batch basis.
      
      Write thread synchronization is an increasing bottleneck for higher levels
      of concurrency, so this diff adds --enable_write_thread_adaptive_yield
      (default off).  This feature causes threads joining a write batch
      group to spin for a short time (default 100 usec) using sched_yield,
      rather than going to sleep on a mutex.  If the timing of the yield calls
      indicates that another thread has actually run during the yield then
      spinning is avoided.  This option improves performance for concurrent
      situations even without parallel adds, although it has the potential to
      increase CPU usage (and the heuristic adaptation is not yet mature).
      
      Parallel writes are not currently compatible with
      inplace updates, update callbacks, or delete filtering.
      Enable it with --allow_concurrent_memtable_write (and
      --enable_write_thread_adaptive_yield).  Parallel memtable writes
      are performance neutral when there is no actual parallelism, and in
      my experiments (SSD server-class Linux and varying contention and key
      sizes for fillrandom) they are always a performance win when there is
      more than one thread.
      
      Statistics are updated earlier in the write path, dropping the number
      of DB mutex acquisitions from 2 to 1 for almost all cases.
      
      This diff was motivated and inspired by Yahoo's cLSM work.  It is more
      conservative than cLSM: RocksDB's write batch group leader role is
      preserved (along with all of the existing flush and write throttling
      logic) and concurrent writers are blocked until all memtable insertions
      have completed and the sequence number has been advanced, to preserve
      linearizability.
      
      My test config is "db_bench -benchmarks=fillrandom -threads=$T
      -batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
      -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
      -disable_auto_compactions --max_write_buffer_number=8
      -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      --block_size=16384 --allow_concurrent_memtable_write" on a two-socket
      Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive.  With 1
      thread I get ~440Kops/sec.  Peak performance for 1 socket (numactl
      -N1) is slightly more than 1Mops/sec, at 16 threads.  Peak performance
      across both sockets happens at 30 threads, and is ~900Kops/sec, although
      with fewer threads there is less performance loss when the system has
      background work.
      
      Test Plan:
      1. concurrent stress tests for InlineSkipList and DynamicBloom
      2. make clean; make check
      3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
      4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
      5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
      6. make clean; OPT=-DROCKSDB_LITE make check
      7. verify no perf regressions when disabled
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D50589
      7d87f027
  16. 17 12月, 2015 1 次提交
    • I
      Introduce ReadOptions::pin_data (support zero copy for keys) · aececc20
      Islam AbdelRahman 提交于
      Summary:
      This patch update the Iterator API to introduce new functions that allow users to keep the Slices returned by key() valid as long as the Iterator is not deleted
      
      ReadOptions::pin_data : If true keep loaded blocks in memory as long as the iterator is not deleted
      Iterator::IsKeyPinned() : If true, this mean that the Slice returned by key() is valid as long as the iterator is not deleted
      
      Also add a new option BlockBasedTableOptions::use_delta_encoding to allow users to disable delta_encoding if needed.
      
      Benchmark results (using https://phabricator.fb.com/P20083553)
      
      ```
      // $ du -h /home/tec/local/normal.4K.Snappy/db10077
      // 6.1G    /home/tec/local/normal.4K.Snappy/db10077
      
      // $ du -h /home/tec/local/zero.8K.LZ4/db10077
      // 6.4G    /home/tec/local/zero.8K.LZ4/db10077
      
      // Benchmarks for shard db10077
      // _build/opt/rocks/benchmark/rocks_copy_benchmark \
      //      --normal_db_path="/home/tec/local/normal.4K.Snappy/db10077" \
      //      --zero_db_path="/home/tec/local/zero.8K.LZ4/db10077"
      
      // First run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                                 1.73s  576.97m
      // BM_StringPiece                                   103.74%      1.67s  598.55m
      // ============================================================================
      // Match rate : 1000000 / 1000000
      
      // Second run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                              611.99ms     1.63
      // BM_StringPiece                                   203.76%   300.35ms     3.33
      // ============================================================================
      // Match rate : 1000000 / 1000000
      ```
      
      Test Plan: Unit tests
      
      Reviewers: sdong, igor, anthony, yhchiang, rven
      
      Reviewed By: rven
      
      Subscribers: dhruba, lovro, adsharma
      
      Differential Revision: https://reviews.facebook.net/D48999
      aececc20
  17. 14 10月, 2015 1 次提交
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  18. 08 10月, 2015 1 次提交
    • D
      bloom hit/miss stats for SST and memtable · a065cdb3
      dyniusz 提交于
      Summary:
      	hit and miss bloom filter stats for memtable and SST
      	stats added to perf_context struct
      	key matches and prefix matches combined into one stat
      
      Test Plan: unit test veryfing the functionality added, see BloomStatsTest in db_test.cc for details
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D47859
      a065cdb3
  19. 18 9月, 2015 1 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
  20. 09 9月, 2015 1 次提交
    • A
      Added Equal method to Comparator interface · 6bdc484f
      Andres Noetzli 提交于
      Summary:
      In some cases, equality comparisons can be done more efficiently than three-way
      comparisons. There are quite a few places in the code where we only care about
      equality. This patch adds an Equal() method that defaults to using the
      Compare() method.
      
      Test Plan: make clean all check
      
      Reviewers: rven, anthony, yhchiang, igor, sdong
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46233
      6bdc484f
  21. 01 9月, 2015 1 次提交
  22. 18 8月, 2015 1 次提交
    • A
      Simplify querying of merge results · f32a5720
      Andres Notzli 提交于
      Summary:
      While working on supporting mixing merge operators with
      single deletes ( https://reviews.facebook.net/D43179 ),
      I realized that returning and dealing with merge results
      can be made simpler. Submitting this as a separate diff
      because it is not directly related to single deletes.
      
      Before, callers of merge helper had to retrieve the merge
      result in one of two ways depending on whether the merge
      was successful or not (success = result of merge was single
      kTypeValue). For successful merges, the caller could query
      the resulting key/value pair and for unsuccessful merges,
      the result could be retrieved in the form of two deques of
      keys and values. However, with single deletes, a successful merge
      does not return a single key/value pair (if merge
      operands are merged with a single delete, we have to generate
      a value and keep the original single delete around to make
      sure that we are not accidentially producing a key overwrite).
      In addition, the two existing call sites of the merge
      helper were taking the same actions independently from whether
      the merge was successful or not, so this patch simplifies that.
      
      Test Plan: make clean all check
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43353
      f32a5720
  23. 17 6月, 2015 1 次提交
  24. 30 5月, 2015 1 次提交
    • A
      Optimistic Transactions · dc9d70de
      agiardullo 提交于
      Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.
      
      Test Plan: Added a new test, but still need to look into stress testing.
      
      Reviewers: yhchiang, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D33435
      dc9d70de
  25. 25 3月, 2015 1 次提交
    • A
      Adding stats for the merge and filter operation · 3d1a924f
      Anurag Indu 提交于
      Summary:
      We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
      We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.
      
      Test Plan: WIP
      
      Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D34377
      3d1a924f
  26. 19 3月, 2015 1 次提交
  27. 28 2月, 2015 1 次提交
  28. 27 2月, 2015 1 次提交
    • I
      rocksdb: Add missing override · 62247ffa
      Igor Sugak 提交于
      Summary:
      When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
      
      Prerequisites: bear and clang 3.5 build with extra tools
      
      ```lang=bash
      % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
      % clang-modernize -p . -include . -add-override
      % make format
      ```
      
      Test Plan:
      Make sure all tests are passing.
      ```lang=bash
      % #Use default fb code clang.
      % make check
      ```
      Verify less error and no missing override errors.
      ```lang=bash
      % # Have trunk clang present in path.
      % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
      ```
      
      Reviewers: igor, kradhakrishnan, rven, meyering, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D34077
      62247ffa
  29. 03 12月, 2014 1 次提交
    • J
      Enforce write buffer memory limit across column families · a14b7873
      Jonah Cohen 提交于
      Summary:
      Introduces a new class for managing write buffer memory across column
      families.  We supplement ColumnFamilyOptions::write_buffer_size with
      ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
      instance that enforces memory limits before flushing out to disk.
      
      Test Plan: Added SharedWriteBuffer unit test to db_test.cc
      
      Reviewers: sdong, rven, ljin, igor
      
      Reviewed By: igor
      
      Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D22581
      a14b7873
  30. 12 11月, 2014 1 次提交
    • I
      Turn on -Wshorten-64-to-32 and fix all the errors · 767777c2
      Igor Canadi 提交于
      Summary:
      We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
      
      This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
      
      Test Plan: compiles
      
      Reviewers: ljin, rven, yhchiang, sdong
      
      Reviewed By: yhchiang
      
      Subscribers: bobbaldwin, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28689
      767777c2
  31. 28 10月, 2014 1 次提交
    • L
      dynamic inplace_update options · f1841985
      Lei Jin 提交于
      Summary:
      Make inplace_update_support and inplace_update_num_locks dynamic.
      inplace_callback becomes immutable
      We are almost free of references to cfd->options() in db_impl
      
      Test Plan: unit test
      
      Reviewers: igor, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D25293
      f1841985
  32. 01 10月, 2014 1 次提交
  33. 18 9月, 2014 1 次提交
  34. 11 9月, 2014 1 次提交
    • I
      Push model for flushing memtables · 3d9e6f77
      Igor Canadi 提交于
      Summary:
      When memtable is full it calls the registered callback. That callback then registers column family as needing the flush. Every write checks if there are some column families that need to be flushed. This completely eliminates the need for MakeRoomForWrite() function and simplifies our Write code-path.
      
      There is some complexity with the concurrency when the column family is dropped. I made it a bit less complex by dropping the column family from the write thread in https://reviews.facebook.net/D22965. Let me know if you want to discuss this.
      
      Test Plan: make check works. I'll also run db_stress with creating and dropping column families for a while.
      
      Reviewers: yhchiang, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23067
      3d9e6f77
  35. 10 9月, 2014 1 次提交
  36. 09 9月, 2014 2 次提交
    • L
      MemTableOptions · 52311463
      Lei Jin 提交于
      Summary: removed reference to options in WriteBatch and DBImpl::Get()
      
      Test Plan: make all check
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23049
      52311463
    • S
      DB::Flush() Do not wait for background threads when there is nothing in mem table · 011241bb
      sdong 提交于
      Summary:
      When we have multiple column families, users can issue Flush() on every column families to make sure everything is flushes, even if some of them might be empty. By skipping the waiting for empty cases, it can be greatly speed up.
      
      Still wait for people's comments before writing unit tests for it.
      
      Test Plan: Will write a unit test to make sure it is correct.
      
      Reviewers: ljin, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D22953
      011241bb
  37. 05 9月, 2014 1 次提交
  38. 03 9月, 2014 1 次提交
    • T
      Refactor PerfStepTimer to stop on destruct · 6614a484
      Torrie Fischer 提交于
      This eliminates the need to remember to call PERF_TIMER_STOP when a section has
      been timed. This allows more useful design with the perf timers and enables
      possible return value optimizations. Simplistic example:
      
      class Foo {
        public:
          Foo(int v) : m_v(v);
        private:
          int m_v;
      }
      
      Foo makeFrobbedFoo(int *errno)
      {
        *errno = 0;
        return Foo();
      }
      
      Foo bar(int *errno)
      {
        PERF_TIMER_GUARD(some_timer);
      
        return makeFrobbedFoo(errno);
      }
      
      int main(int argc, char[] argv)
      {
        Foo f;
        int errno;
      
        f = bar(&errno);
      
        if (errno)
          return -1;
        return 0;
      }
      
      After bar() is called, perf_context.some_timer would be incremented as if
      Stop(&perf_context.some_timer) was called at the end, and the compiler is still
      able to produce optimizations on the return value from makeFrobbedFoo() through
      to main().
      6614a484
  39. 27 8月, 2014 1 次提交