1. 25 8月, 2015 3 次提交
    • A
      Fixing race condition in DBTest.DynamicMemtableOptions · 20508329
      Andres Noetzli 提交于
      Summary:
      This patch fixes a race condition in DBTEst.DynamicMemtableOptions. In rare cases,
      it was possible that the main thread would fill up both memtables before the flush
      job acquired its work. Then, the flush job was flushing both memtables together,
      producing only one L0 file while the test expected two. Now, the test waits for
      flushes to finish earlier, to make sure that the memtables are flushed in separate
      flush jobs.
      
      Test Plan:
      Insert "usleep(10000);" after "IOSTATS_SET_THREAD_POOL_ID(Env::Priority::HIGH);" in BGWorkFlush()
      to make the issue more likely. Then test with:
      make db_test && time while ./db_test --gtest_filter=*DynamicMemtableOptions; do true; done
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45429
      20508329
    • I
      Remove an extra 's' from cur-size-all-mem-tabless · e46bcc08
      Igor Canadi 提交于
      Summary: As title
      
      Test Plan: make check
      
      Reviewers: yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45447
      e46bcc08
    • I
      Smarter purging during flush · 4ab26c5a
      Igor Canadi 提交于
      Summary:
      Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks.
      
      This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point.
      
      I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature.
      
      Test Plan:
      make check
      I had to adjust some unit tests to understand this new behavior
      
      Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli
      
      Reviewed By: noetzli
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42087
      4ab26c5a
  2. 23 8月, 2015 1 次提交
    • M
      Fix benchmark report script · 4c81ac0c
      Mark Callaghan 提交于
      Summary:
      db_bench output now displays Percentile many times with --statistics after
      read IO latency histograms were added. So I only need the last one in the report output.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run run_flash_bench.sh
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D45093
      4c81ac0c
  3. 22 8月, 2015 2 次提交
  4. 21 8月, 2015 11 次提交
    • S
      Add options.new_table_reader_for_compaction_inputs · 9130873a
      sdong 提交于
      Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.
      
      Test Plan: Add the option.
      
      Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang
      
      Reviewed By: igor
      
      Subscribers: igor, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43311
      9130873a
    • S
      Add a counter about estimated pending compaction bytes · 07d2d341
      sdong 提交于
      Summary:
      Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
      In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.
      
      Test Plan: Add unit tests
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44205
      07d2d341
    • M
      Improve defaults for benchmarks · 41a0e281
      Mark Callaghan 提交于
      Summary:
      Changes include:
      * don't sync-on-commit for single writer thread in readwhile... tests
      * make default block size 8kb rather than 4kb to avoid too small blocks after compression
      * use snappy instead of zlib to avoid stalls from compression latency
      * disable statistics
      * use bytes_per_sync=8M to reduce throughput loss on disk
      * use open_files=-1 to reduce mutex contention
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run benchmark
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44961
      41a0e281
    • Y
      Fixed a rare deadlock in DBTest.ThreadStatusFlush · a203b913
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, ThreadStatusFlush uses two sync-points to ensure
      there's a flush currently running when calling GetThreadList().
      However, one of the sync-point is inside db-mutex, which could
      cause deadlock in case there's a DB::Get() call.
      
      This patch fix this issue by moving the sync-point to a better
      place where the flush job does not hold the mutex.
      
      Test Plan: db_test
      
      Reviewers: igor, sdong, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45045
      a203b913
    • S
      Merge pull request #695 from yuslepukhin/address_windows_build · 962aa642
      Siying Dong 提交于
      Address windows build issues caused by introducing Subcompaction
      962aa642
    • D
      More indent adjustment. · 5bf89076
      Dmitri Smirnov 提交于
      5bf89076
    • D
      Adjust indent · e2a9f43d
      Dmitri Smirnov 提交于
      e2a9f43d
    • D
      Merge branch 'address_windows_build' of https://github.com/yuslepukhin/rocksdb... · 6e9a260b
      Dmitri Smirnov 提交于
      Merge branch 'address_windows_build' of https://github.com/yuslepukhin/rocksdb into address_windows_build
      6e9a260b
    • D
      Address windows build issues · 1cac89c9
      Dmitri Smirnov 提交于
       Intro SubCompactionState move functionality
       =delete copy functionality
       #ifdef SyncPoint in tests for Windows Release builds
      1cac89c9
    • D
      Address windows build issues · f25f06dd
      Dmitri Smirnov 提交于
        Intro SubCompactionState move functionality
        =delete copy functionality
        #ifdef SyncPoint in tests for Windows Release builds
      f25f06dd
    • I
      Total SST files size DB Property · 027ca5b2
      Islam AbdelRahman 提交于
      Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions
      
      Test Plan: Unittests for the new property
      
      Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44799
      027ca5b2
  5. 20 8月, 2015 5 次提交
  6. 19 8月, 2015 7 次提交
    • A
      Removing variables used only in assertions to prevent build error · 137c3766
      Ari Ekmekji 提交于
      Summary:
      A couple variables were declared but only used in assertions
      which causes issues when building in fbcode.
      
      Test Plan: make dbg  and   make release
      
      Reviewers: yhchiang, sdong, igor, anthony, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44937
      137c3766
    • A
      Bounding Number of Subcompactions · b47cc585
      Ari Ekmekji 提交于
      Summary:
      In D43239 (https://reviews.facebook.net/D43239) the number
      of subcompactions is set based on the number of L1 files with
      unique starting keys. In certain cases when this number is very large
      this causes issues, particularly with the overlap between files since
      very small output files can be generated. This diff bounds the number
      of subcompactions to the user option DBOption.num_subcompactions.
      
      Test Plan: ./db_test ./db_compaction_test
      
      Reviewers: sdong, igor, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44883
      b47cc585
    • V
      Make tailing iterator show new entries in memtable. · e58e1b18
      Venkatesh Radhakrishnan 提交于
      Summary:
      Reseek mutable_iter if it is invalid in Next and immutable_iter
      is invalid.
      
      Test Plan: DBTestTailingIterator.TailingIteratorSeekToNext
      
      Reviewers: tnovak, march, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D44865
      e58e1b18
    • Y
      DBOptions serialization and deserialization · 9ec95715
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch implements DBOptions deserialization and improve
      the current implementation of DBOptions serialization by
      using a static structure that stores the offset of each
      DBOptions member variables to perform serialization and
      deserialization instead of using tons of if-then-branch
      to determine the mapping between string and variables.
      
      Test Plan: Added test in options_test.cc
      
      Reviewers: igor, anthony, sdong, IslamAbdelRahman
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44097
      9ec95715
    • Y
      Make HashCuckooRep::ApproximateMemoryUsage() return reasonable estimation. · b2df20a8
      Yueh-Hsuan Chiang 提交于
      Summary:
      HashCuckooRep::ApproximateMemoryUsage() previously return
      std::numeric_limits<size_t>::max() when it cannot accept more
      entries.  This patch makes it return a more reasonable estimation.
      
      This change is necessary in order to make GetIntProperty("rocksdb.cur-size-all-mem-tables")
      handles HashCuckooRep properly in diff https://reviews.facebook.net/D44229.
      
      Test Plan: db_test
      
      Reviewers: sdong, anthony, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44241
      b2df20a8
    • A
      Fixing Failed Assertion in Subcompaction State Diff · 601b1aac
      Ari Ekmekji 提交于
      Summary:
      In D43239 (https://reviews.facebook.net/D43239) there is an
      assertion to make sure a subcompaction's output is never empty at the
      end of execution. This assertion however breaks the build because some
      tests lead to exactly that scenario. So instead I have altered the logic
      to handle this case instead of just failing the assertion.
      
      The reason that it is possible for a subcompaction's output to be empty is
      that during a sequential execution of subcompactions, if a user aborts the
      compaction job then some of the later subcompactions to be executed may
      have yet to process any keys and therefore have yet to generate output files.
      This becomes very rare once the subcompactions are executed in parallel,
      but for now they are still sequential so the case is possible when there is an
      early termination, as in some of the tests.
      
      Test Plan: ./db_test  ./db_compaction_test
      
      Reviewers: sdong, igor, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44877
      601b1aac
    • A
      [Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State · f0da6977
      Ari Ekmekji 提交于
      Summary:
      In prepration for running multiple threads at the same time during
      a compaction job, this patch assigns each subcompaction its own state
      (instead of sharing the one global CompactionState). Each subcompaction then
      uses this state to update its statistics, keep track of its snapshots, etc.
      during the course of execution. Then at the end of all the executions the
      statistics are aggregated across the subcompactions so that the final result
      is the same as if only one larger compaction had run.
      
      Test Plan: ./db_test  ./db_compaction_test  ./compaction_job_test
      
      Reviewers: sdong, anthony, igor, noetzli, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43239
      f0da6977
  7. 18 8月, 2015 1 次提交
    • A
      Simplify querying of merge results · f32a5720
      Andres Notzli 提交于
      Summary:
      While working on supporting mixing merge operators with
      single deletes ( https://reviews.facebook.net/D43179 ),
      I realized that returning and dealing with merge results
      can be made simpler. Submitting this as a separate diff
      because it is not directly related to single deletes.
      
      Before, callers of merge helper had to retrieve the merge
      result in one of two ways depending on whether the merge
      was successful or not (success = result of merge was single
      kTypeValue). For successful merges, the caller could query
      the resulting key/value pair and for unsuccessful merges,
      the result could be retrieved in the form of two deques of
      keys and values. However, with single deletes, a successful merge
      does not return a single key/value pair (if merge
      operands are merged with a single delete, we have to generate
      a value and keep the original single delete around to make
      sure that we are not accidentially producing a key overwrite).
      In addition, the two existing call sites of the merge
      helper were taking the same actions independently from whether
      the merge was successful or not, so this patch simplifies that.
      
      Test Plan: make clean all check
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43353
      f32a5720
  8. 15 8月, 2015 2 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
    • N
      reduce db mutex contention for write batch groups · b7198c3a
      Nathan Bronson 提交于
      Summary:
      This diff allows a Writer to join the next write batch group
      without acquiring any locks. Waiting is performed via a per-Writer mutex,
      so all of the non-leader writers never need to acquire the db mutex.
      It is now possible to join a write batch group after the leader has been
      chosen but before the batch has been constructed. This diff doesn't
      increase parallelism, but reduces synchronization overheads.
      
      For some CPU-bound workloads (no WAL, RAM-sized working set) this can
      substantially reduce contention on the db mutex in a multi-threaded
      environment.  With T=8 N=500000 in a CPU-bound scenario (see the test
      plan) this is good for a 33% perf win.  Not all scenarios see such a
      win, but none show a loss.  This code is slightly faster even for the
      single-threaded case (about 2% for the CPU-bound scenario below).
      
      Test Plan:
      1. unit tests
      2. COMPILE_WITH_TSAN=1 make check
      3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      
      Reviewers: sdong, igor, rven, ljin, yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43887
      b7198c3a
  9. 14 8月, 2015 4 次提交
    • S
      Add options.compaction_measure_io_stats to print write I/O stats in compactions · 603b6da8
      sdong 提交于
      Summary:
      Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
      
      2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
      
      Add two more counters in iostats_context.
      
      Also add a parameter of db_bench.
      
      Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44115
      603b6da8
    • I
      Change master to 3.14 · dc9d5634
      Islam AbdelRahman 提交于
      Summary: Change master version to 3.14
      
      Test Plan: simple change
      
      Reviewers: sdong, yhchiang, kradhakrishnan, rven, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44187
      dc9d5634
    • I
      Merge pull request #689 from msb-at-yahoo/add-tools-target · b78c8e07
      Igor Canadi 提交于
      Add a 'tools' target.
      b78c8e07
    • M
      Add a 'tools' target. · 9f0dd222
      maurice barnum 提交于
      My use case is to build the rocksdb static library and tools, and
      ideally I'd like to not spend time building the shared library and other
      targets that I won't use.
      9f0dd222
  10. 13 8月, 2015 2 次提交
    • S
      Add test case to repro the mispositional iterator in a low-chance data race case · 46372071
      sdong 提交于
      Summary: Iterator has a bug: if a child iterator reaches its end, and user issues a Prev(), and just before SeekToLast() of the child iterator is called, some extra rows is added in the end, the position of iterator can be misplaced.
      
      Test Plan: Run the tests with or without valgrind
      
      Reviewers: rven, yhchiang, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: tnovak, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43671
      46372071
    • I
      [Cleanup] Remove RandomRWFile · 3bd9db42
      Islam AbdelRahman 提交于
      Summary: RandomRWFile is not used anywhere in out code base, this patch remove RandomRWFile
      
      Test Plan:
      make check -j64
      USE_CLANG=1 make all -j64
      OPT=-DROCKSDB_LITE make release -j64
      
      Reviewers: sdong, yhchiang, anthony, kradhakrishnan, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44091
      3bd9db42
  11. 12 8月, 2015 2 次提交