1. 21 8月, 2015 10 次提交
    • S
      Add a counter about estimated pending compaction bytes · 07d2d341
      sdong 提交于
      Summary:
      Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
      In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.
      
      Test Plan: Add unit tests
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44205
      07d2d341
    • M
      Improve defaults for benchmarks · 41a0e281
      Mark Callaghan 提交于
      Summary:
      Changes include:
      * don't sync-on-commit for single writer thread in readwhile... tests
      * make default block size 8kb rather than 4kb to avoid too small blocks after compression
      * use snappy instead of zlib to avoid stalls from compression latency
      * disable statistics
      * use bytes_per_sync=8M to reduce throughput loss on disk
      * use open_files=-1 to reduce mutex contention
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run benchmark
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44961
      41a0e281
    • Y
      Fixed a rare deadlock in DBTest.ThreadStatusFlush · a203b913
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, ThreadStatusFlush uses two sync-points to ensure
      there's a flush currently running when calling GetThreadList().
      However, one of the sync-point is inside db-mutex, which could
      cause deadlock in case there's a DB::Get() call.
      
      This patch fix this issue by moving the sync-point to a better
      place where the flush job does not hold the mutex.
      
      Test Plan: db_test
      
      Reviewers: igor, sdong, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45045
      a203b913
    • S
      Merge pull request #695 from yuslepukhin/address_windows_build · 962aa642
      Siying Dong 提交于
      Address windows build issues caused by introducing Subcompaction
      962aa642
    • D
      More indent adjustment. · 5bf89076
      Dmitri Smirnov 提交于
      5bf89076
    • D
      Adjust indent · e2a9f43d
      Dmitri Smirnov 提交于
      e2a9f43d
    • D
      Merge branch 'address_windows_build' of https://github.com/yuslepukhin/rocksdb... · 6e9a260b
      Dmitri Smirnov 提交于
      Merge branch 'address_windows_build' of https://github.com/yuslepukhin/rocksdb into address_windows_build
      6e9a260b
    • D
      Address windows build issues · 1cac89c9
      Dmitri Smirnov 提交于
       Intro SubCompactionState move functionality
       =delete copy functionality
       #ifdef SyncPoint in tests for Windows Release builds
      1cac89c9
    • D
      Address windows build issues · f25f06dd
      Dmitri Smirnov 提交于
        Intro SubCompactionState move functionality
        =delete copy functionality
        #ifdef SyncPoint in tests for Windows Release builds
      f25f06dd
    • I
      Total SST files size DB Property · 027ca5b2
      Islam AbdelRahman 提交于
      Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions
      
      Test Plan: Unittests for the new property
      
      Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44799
      027ca5b2
  2. 20 8月, 2015 5 次提交
  3. 19 8月, 2015 7 次提交
    • A
      Removing variables used only in assertions to prevent build error · 137c3766
      Ari Ekmekji 提交于
      Summary:
      A couple variables were declared but only used in assertions
      which causes issues when building in fbcode.
      
      Test Plan: make dbg  and   make release
      
      Reviewers: yhchiang, sdong, igor, anthony, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44937
      137c3766
    • A
      Bounding Number of Subcompactions · b47cc585
      Ari Ekmekji 提交于
      Summary:
      In D43239 (https://reviews.facebook.net/D43239) the number
      of subcompactions is set based on the number of L1 files with
      unique starting keys. In certain cases when this number is very large
      this causes issues, particularly with the overlap between files since
      very small output files can be generated. This diff bounds the number
      of subcompactions to the user option DBOption.num_subcompactions.
      
      Test Plan: ./db_test ./db_compaction_test
      
      Reviewers: sdong, igor, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44883
      b47cc585
    • V
      Make tailing iterator show new entries in memtable. · e58e1b18
      Venkatesh Radhakrishnan 提交于
      Summary:
      Reseek mutable_iter if it is invalid in Next and immutable_iter
      is invalid.
      
      Test Plan: DBTestTailingIterator.TailingIteratorSeekToNext
      
      Reviewers: tnovak, march, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D44865
      e58e1b18
    • Y
      DBOptions serialization and deserialization · 9ec95715
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch implements DBOptions deserialization and improve
      the current implementation of DBOptions serialization by
      using a static structure that stores the offset of each
      DBOptions member variables to perform serialization and
      deserialization instead of using tons of if-then-branch
      to determine the mapping between string and variables.
      
      Test Plan: Added test in options_test.cc
      
      Reviewers: igor, anthony, sdong, IslamAbdelRahman
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44097
      9ec95715
    • Y
      Make HashCuckooRep::ApproximateMemoryUsage() return reasonable estimation. · b2df20a8
      Yueh-Hsuan Chiang 提交于
      Summary:
      HashCuckooRep::ApproximateMemoryUsage() previously return
      std::numeric_limits<size_t>::max() when it cannot accept more
      entries.  This patch makes it return a more reasonable estimation.
      
      This change is necessary in order to make GetIntProperty("rocksdb.cur-size-all-mem-tables")
      handles HashCuckooRep properly in diff https://reviews.facebook.net/D44229.
      
      Test Plan: db_test
      
      Reviewers: sdong, anthony, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44241
      b2df20a8
    • A
      Fixing Failed Assertion in Subcompaction State Diff · 601b1aac
      Ari Ekmekji 提交于
      Summary:
      In D43239 (https://reviews.facebook.net/D43239) there is an
      assertion to make sure a subcompaction's output is never empty at the
      end of execution. This assertion however breaks the build because some
      tests lead to exactly that scenario. So instead I have altered the logic
      to handle this case instead of just failing the assertion.
      
      The reason that it is possible for a subcompaction's output to be empty is
      that during a sequential execution of subcompactions, if a user aborts the
      compaction job then some of the later subcompactions to be executed may
      have yet to process any keys and therefore have yet to generate output files.
      This becomes very rare once the subcompactions are executed in parallel,
      but for now they are still sequential so the case is possible when there is an
      early termination, as in some of the tests.
      
      Test Plan: ./db_test  ./db_compaction_test
      
      Reviewers: sdong, igor, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44877
      601b1aac
    • A
      [Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State · f0da6977
      Ari Ekmekji 提交于
      Summary:
      In prepration for running multiple threads at the same time during
      a compaction job, this patch assigns each subcompaction its own state
      (instead of sharing the one global CompactionState). Each subcompaction then
      uses this state to update its statistics, keep track of its snapshots, etc.
      during the course of execution. Then at the end of all the executions the
      statistics are aggregated across the subcompactions so that the final result
      is the same as if only one larger compaction had run.
      
      Test Plan: ./db_test  ./db_compaction_test  ./compaction_job_test
      
      Reviewers: sdong, anthony, igor, noetzli, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43239
      f0da6977
  4. 18 8月, 2015 1 次提交
    • A
      Simplify querying of merge results · f32a5720
      Andres Notzli 提交于
      Summary:
      While working on supporting mixing merge operators with
      single deletes ( https://reviews.facebook.net/D43179 ),
      I realized that returning and dealing with merge results
      can be made simpler. Submitting this as a separate diff
      because it is not directly related to single deletes.
      
      Before, callers of merge helper had to retrieve the merge
      result in one of two ways depending on whether the merge
      was successful or not (success = result of merge was single
      kTypeValue). For successful merges, the caller could query
      the resulting key/value pair and for unsuccessful merges,
      the result could be retrieved in the form of two deques of
      keys and values. However, with single deletes, a successful merge
      does not return a single key/value pair (if merge
      operands are merged with a single delete, we have to generate
      a value and keep the original single delete around to make
      sure that we are not accidentially producing a key overwrite).
      In addition, the two existing call sites of the merge
      helper were taking the same actions independently from whether
      the merge was successful or not, so this patch simplifies that.
      
      Test Plan: make clean all check
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43353
      f32a5720
  5. 15 8月, 2015 2 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
    • N
      reduce db mutex contention for write batch groups · b7198c3a
      Nathan Bronson 提交于
      Summary:
      This diff allows a Writer to join the next write batch group
      without acquiring any locks. Waiting is performed via a per-Writer mutex,
      so all of the non-leader writers never need to acquire the db mutex.
      It is now possible to join a write batch group after the leader has been
      chosen but before the batch has been constructed. This diff doesn't
      increase parallelism, but reduces synchronization overheads.
      
      For some CPU-bound workloads (no WAL, RAM-sized working set) this can
      substantially reduce contention on the db mutex in a multi-threaded
      environment.  With T=8 N=500000 in a CPU-bound scenario (see the test
      plan) this is good for a 33% perf win.  Not all scenarios see such a
      win, but none show a loss.  This code is slightly faster even for the
      single-threaded case (about 2% for the CPU-bound scenario below).
      
      Test Plan:
      1. unit tests
      2. COMPILE_WITH_TSAN=1 make check
      3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      
      Reviewers: sdong, igor, rven, ljin, yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43887
      b7198c3a
  6. 14 8月, 2015 4 次提交
    • S
      Add options.compaction_measure_io_stats to print write I/O stats in compactions · 603b6da8
      sdong 提交于
      Summary:
      Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
      
      2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
      
      Add two more counters in iostats_context.
      
      Also add a parameter of db_bench.
      
      Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44115
      603b6da8
    • I
      Change master to 3.14 · dc9d5634
      Islam AbdelRahman 提交于
      Summary: Change master version to 3.14
      
      Test Plan: simple change
      
      Reviewers: sdong, yhchiang, kradhakrishnan, rven, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44187
      dc9d5634
    • I
      Merge pull request #689 from msb-at-yahoo/add-tools-target · b78c8e07
      Igor Canadi 提交于
      Add a 'tools' target.
      b78c8e07
    • M
      Add a 'tools' target. · 9f0dd222
      maurice barnum 提交于
      My use case is to build the rocksdb static library and tools, and
      ideally I'd like to not spend time building the shared library and other
      targets that I won't use.
      9f0dd222
  7. 13 8月, 2015 2 次提交
    • S
      Add test case to repro the mispositional iterator in a low-chance data race case · 46372071
      sdong 提交于
      Summary: Iterator has a bug: if a child iterator reaches its end, and user issues a Prev(), and just before SeekToLast() of the child iterator is called, some extra rows is added in the end, the position of iterator can be misplaced.
      
      Test Plan: Run the tests with or without valgrind
      
      Reviewers: rven, yhchiang, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: tnovak, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43671
      46372071
    • I
      [Cleanup] Remove RandomRWFile · 3bd9db42
      Islam AbdelRahman 提交于
      Summary: RandomRWFile is not used anywhere in out code base, this patch remove RandomRWFile
      
      Test Plan:
      make check -j64
      USE_CLANG=1 make all -j64
      OPT=-DROCKSDB_LITE make release -j64
      
      Reviewers: sdong, yhchiang, anthony, kradhakrishnan, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44091
      3bd9db42
  8. 12 8月, 2015 9 次提交
    • A
      Have Transactions use WriteBatch::RollbackToSavePoint · c3466eab
      agiardullo 提交于
      Summary:
      Clean up transactions to use the new RollbackToSavePoint api in WriteBatchWithIndex.
      
      Note, this diff depends on Pessimistic Transactions diff and ManagedSnapshot diff (D40869 and D43293).
      
      Test Plan: unit tests
      
      Reviewers: rven, yhchiang, kradhakrishnan, spetrunia, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43371
      c3466eab
    • A
      Transaction error statuses · 0db807ec
      agiardullo 提交于
      Summary:
      Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures.
      
      https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954
      
      Test Plan: unit tests
      
      Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43323
      0db807ec
    • A
      Pessimistic Transactions · c2f2cb02
      agiardullo 提交于
      Summary:
      Initial implementation of Pessimistic Transactions.  This diff contains the api changes discussed in D38913.  This diff is pretty large, so let me know if people would prefer to meet up to discuss it.
      
      MyRocks folks:  please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.
      
      Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint().  After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex.  We can then decide which route is preferable.
      
      Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.
      
      Test Plan: Unit tests, db_bench parallel testing.
      
      Reviewers: igor, rven, sdong, yhchiang, yoshinorim
      
      Reviewed By: sdong
      
      Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40869
      c2f2cb02
    • I
      Use manual_compaction for compaction_job_test · c2868cbc
      Islam AbdelRahman 提交于
      Summary:
      Under certain conditions (disable compression) the compactions that are created in compaction_job_test will pass the trivial_move conditions
      This will cause problems since we assert that we dont run a compaction if it's a trivial move
      https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L144-L147
      
      for example when we disable compression, compactions become a valid trivial move and the assert fails
      https://ci-builds.fb.com/view/rocksdb/job/rocksdb_no_compression/180/console
      
      Test Plan: compaction_job_test
      
      Reviewers: sdong, yhchiang, noetzli, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43983
      c2868cbc
    • A
      Fix Windows build by adding snapshot_impl to CMakeLists · 6b2d5703
      agiardullo 提交于
      Test Plan: untested
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D44049
      6b2d5703
    • Y
      Fixed clang-build error in util/thread_local.cc · e61fafbe
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch fixes the following clang-build error in util/thread_local.cc by using a cleaner macro blocker:
      
      12:26:31 util/thread_local.cc:157:19: error: declaration shadows a static data member of 'rocksdb::ThreadLocalPtr::StaticMeta' [-Werror,-Wshadow]
      12:26:31       ThreadData* tls_ =
      12:26:31                   ^
      12:26:31 util/thread_local.cc:19:66: note: previous declaration is here
      12:26:31 __thread ThreadLocalPtr::ThreadData* ThreadLocalPtr::StaticMeta::tls_ = nullptr;
      12:26:31                                                                  ^
      
      Test Plan: db_test
      
      Reviewers: sdong, anthony, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44043
      e61fafbe
    • I
      Parallelize LoadTableHandlers · cee1e8a0
      Islam AbdelRahman 提交于
      Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover
      
      Test Plan:
      make check -j64
      COMPILE_WITH_TSAN=1 make check -j64
      DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running)
      
      Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43755
      cee1e8a0
    • A
      Removing duplicate code in db_bench/db_stress, fixing typos · 4249f159
      Andres Notzli 提交于
      Summary:
      While working on single delete support for db_bench, I realized that
      db_bench/db_stress contain a bunch of duplicate code related to
      copmression and found some typos. This patch removes duplicate code,
      typos and a redundant #ifndef in internal_stats.cc.
      
      Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress
      
      Reviewers: yhchiang, sdong, rven, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43965
      4249f159
    • I
      Fix linters on non-fb machines · a03085b5
      Igor Canadi 提交于
      Summary:
      Our linters assume that clang-format is installed at /mnt/vol/engshare/admin/scripts/clang-format and flint is installed at /home/engshare/tools/flint. This makes them fail on non-fb machines. This change will:
      * if clang-format is not on a specified path, it will try running generic clang-format. Linters will still fail if clang-format is not installed, but this shouldn't be a big issue, since it's pretty easy to install it.
      * flint will not be run if /home/engshare/tools/flint is not present
      
      Test Plan: Made a change on a mac machine. Ran `arc lint`. No failures observed.
      
      Reviewers: aekmekji, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D44031
      a03085b5