1. 02 7月, 2015 1 次提交
    • D
      Windows Port from Microsoft · 18285c1e
      Dmitri Smirnov 提交于
       Summary: Make RocksDb build and run on Windows to be functionally
       complete and performant. All existing test cases run with no
       regressions. Performance numbers are in the pull-request.
      
       Test plan: make all of the existing unit tests pass, obtain perf numbers.
      
       Co-authored-by: Praveen Rao praveensinghrao@outlook.com
       Co-authored-by: Sherlock Huang baihan.huang@gmail.com
       Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
       Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com
      18285c1e
  2. 24 6月, 2015 2 次提交
    • I
      Bottommost level compaction option · 674b1181
      Islam AbdelRahman 提交于
      Summary: Replace force_bottommost_level_compaction in CompactRangeOption with an option that allow the user to (always skip, always compact, compact if compaction filter is present) the bottommost level for level based compaction.
      
      Test Plan: make check
      
      Reviewers: sdong, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40527
      674b1181
    • G
      Implement a table-level row cache · 782a1590
      Giuseppe Ottaviano 提交于
      Summary:
      Implementation of a table-level row cache.
      It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache.
      
      Supports snapshots and merge operations.
      
      Test Plan: Ran `make valgrind_check commit-prereq`
      
      Reviewers: igor, philipp, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39849
      782a1590
  3. 23 6月, 2015 2 次提交
    • K
      Introduce WAL recovery consistency levels · de85e4ca
      krad 提交于
      Summary:
      The "one size fits all" approach with WAL recovery will only introduce inconvenience for our varied clients as we go forward. The current recovery is a bit heuristic. We introduce the following levels of consistency while replaying the WAL.
      
      1. RecoverAfterRestart (kTolerateCorruptedTailRecords)
      
      This mocks the current recovery mode.
      
      2. RecoverAfterCleanShutdown (kAbsoluteConsistency)
      
      This is ideal for unit test and cases where the store is shutdown cleanly. We tolerate no corruption or incomplete writes.
      
      3. RecoverPointInTime (kPointInTimeRecovery)
      
      This is ideal when using devices with controller cache or file systems which can loose data on restart. We recover upto the point were is no corruption or incomplete write.
      
      4. RecoverAfterDisaster (kSkipAnyCorruptRecord)
      
      This is ideal mode to recover data. We tolerate corruption and incomplete writes, and we hop over those sections that we cannot make sense of salvaging as many records as possible.
      
      Test Plan:
      (1) Run added unit test to cover all levels.
      (2) Run make check.
      
      Reviewers: leveldb, sdong, igor
      
      Subscribers: yoshinorim, dhruba
      
      Differential Revision: https://reviews.facebook.net/D38487
      de85e4ca
    • I
      Fix trivial move merge · 530534fc
      Islam AbdelRahman 提交于
      Summary: Fixing bad merge
      
      Test Plan: make -j64 check (this is not enough to verify the fix)
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40521
      530534fc
  4. 20 6月, 2015 1 次提交
    • V
      Add wal files to Checkpoint for multiple column families. · 04251e1e
      Venkatesh Radhakrishnan 提交于
      Summary:
      When there are multiple column families, the flush in
      GetLiveFiles is not atomic, so that there are entries in the wal files
      which are needed to get a consisten RocksDB. We now add the log files to
      the checkpoint.
      
      Test Plan:
      CheckpointCF - This test forces more data to be written to
      the other column families after the flush of the first column family but
      before the second.
      
      Reviewers: igor, yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40323
      04251e1e
  5. 19 6月, 2015 4 次提交
    • I
      Disable CompressLevelCompaction() if Zlib is not supported · bf03f59c
      Igor Canadi 提交于
      Summary: CompressLevelCompaction() depends on Zlib. We should skip it when zlib is not present.
      
      Test Plan: `make check` without zlib
      
      Reviewers: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40401
      bf03f59c
    • I
      Fail DB::Open() when the requested compression is not available · 760e9a94
      Igor Canadi 提交于
      Summary:
      Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users.
      
      This patch fails DB::Open() if we don't support the compression that is specified in the options.
      
      Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails.
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39687
      760e9a94
    • I
      Skip bottommost level compaction if possible · 4eabbdb7
      Islam AbdelRahman 提交于
      Summary:
      This is https://reviews.facebook.net/D39999 but after introducing an option to force compaction the bottom most level
      
      Changes in this patch
      - Introduce force_bottommost_level_compaction to CompactRangeOptions that force compacting bottommost level during compaction
      - Skip bottommost level compaction if we dont have a compaction filter and force_bottommost_level_compaction options is not set
      
      Although tests pass on my machine but I suspect that there maybe some tests that I am not aware of that  should use force_bottommost_level_compaction to pass in a deterministic way
      
      Test Plan:
      make check
      adding new tests
      
      Reviewers: igor, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40059
      4eabbdb7
    • I
      Don't dump DBOptions for each column family · 4b8bb62f
      Igor Canadi 提交于
      Summary: Currently we dump DBOptions for each column family options we dump. This leads to duplicate lines in our LOG file. This diff fixes that.
      
      Test Plan: Check out the LOG
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: IslamAbdelRahman, yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39729
      4b8bb62f
  6. 18 6月, 2015 6 次提交
    • Y
      Fixed a bug of CompactionStats in multi-level universal compaction case · bb1c74ce
      Yueh-Hsuan Chiang 提交于
      Summary:
      Universal compaction can involves in multiple levels.  However,
      the current implementation of bytes_readn and bytes_readnp1
      (and some other stats with postfix `n` and `np1`) assumes compaction
      can only have two levels.
      
      This patch fixes this bug and redefines bytes_readn and bytes_readnp1:
      * bytes_readnp1: the number of bytes read in the compaction output level.
      * bytes_readn: the total number of bytes read minus bytes_readnp1
      
      Test Plan: Add a test in compaction_job_stats_test
      
      Reviewers: igor, sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40239
      bb1c74ce
    • P
      Replace %llu with format macros in ParsedInternalKey::DebugString()) · f06be62f
      Poornima Chozhiyath Raman 提交于
      Test Plan: successfully compiled the code
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40233
      f06be62f
    • I
      Add --benchmark_write_rate_limit option to db_bench · 2dc3910b
      Igor Canadi 提交于
      Summary:
      So far, we benchmarked RocksDB by writing as fast as possible. With this change, we're able to limit our write throughput, which should help us better understand how RocksDB performes under varying write workloads.
      
      Specifically, I'm currently interested in the shape of the graph that has write throughput on one axis and write rate on another. This should help us with designing our stall system, as we have started to do with D36351.
      
      Test Plan:
          $ ./db_bench --benchmarks=fillrandom --benchmark_write_rate_limit=1000000
          fillrandom   :     118.523 micros/op 8437 ops/sec;    0.9 MB/s
          $ ./db_bench --benchmarks=fillrandom --benchmark_write_rate_limit=2000000
          fillrandom   :      59.136 micros/op 16910 ops/sec;    1.9 MB/s
      
      Reviewers: MarkCallaghan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39759
      2dc3910b
    • I
      Use CompactRangeOptions for CompactRange · 12e030a9
      Islam AbdelRahman 提交于
      Summary:
      This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters
      Old CompactRange is still available but deprecated
      
      Test Plan:
      make all check
      make rocksdbjava
      USE_CLANG=1 make all
      OPT=-DROCKSDB_LITE make release
      
      Reviewers: sdong, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40209
      12e030a9
    • I
      Clean up InstallSuperVersion · 25d60056
      Igor Canadi 提交于
      Summary:
      We go to great lengths to make sure MaybeScheduleFlushOrCompaction() is called outside of write thread. But anyway, it's still called in the mutex, so it's not that much cheaper.
      
      This diff removes the "optimization" and cleans up the code a bit.
      
      Test Plan: make check
      
      Reviewers: rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40113
      25d60056
    • Y
      Block c_test in ROCKSDB_LITE · 1a08d0be
      Yueh-Hsuan Chiang 提交于
      Summary: Block c_test in ROCKSDB_LITE as it's not supported in ROCKSDB_LITE.
      
      Test Plan: c_test
      
      Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40257
      1a08d0be
  7. 17 6月, 2015 1 次提交
  8. 13 6月, 2015 4 次提交
  9. 12 6月, 2015 5 次提交
    • S
      Slow down writes by bytes written · 7842920b
      sdong 提交于
      Summary:
      We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch.
      
      The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work
      
      hard_rate_limit is deprecated.
      
      options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up.
      
      Test Plan: Add new unit tests in db_test
      
      Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor
      
      Reviewed By: igor
      
      Subscribers: ikabiljo, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D36351
      7842920b
    • I
      Don't let two L0->L1 compactions run in parallel · a84df655
      Igor Canadi 提交于
      Summary: With experimental feature SuggestCompactRange() we don't restrict running two L0->L1 compactions in parallel. This diff fixes this.
      
      Test Plan: added a unit test to reproduce the failure. fixed the unit test
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39981
      a84df655
    • I
      Add largest sequence to FlushJobInfo · d6ce0f7c
      Islam AbdelRahman 提交于
      Summary:
      Adding largest sequence number to FlushJobInfo
      and passing flushed file metadata to NotifyOnFlushCompleted which include alot of other values that we may want to expose in FlushJobInfo
      
      Test Plan: make check
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39927
      d6ce0f7c
    • Y
      Add Env::GetThreadID(), which returns the ID of the current thread. · 3eddd1ab
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add Env::GetThreadID(), which returns the ID of the current thread.
      
      In addition, make GetThreadList() and InfoLog use same unique ID for the same thread.
      
      Test Plan:
      db_test
      listener_test
      
      Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39735
      3eddd1ab
    • I
      Handling edge cases for ReFitLevel · 73faa3d4
      Islam AbdelRahman 提交于
      Summary:
      Right now the level we pass to ReFitLevel is the maximum level with files (before compaction), there are multiple cases where this maximum level have changed after compaction
      - all files where in L0 (now maximum level is L1)
      - using kCompactionStyleUniversal (now maximum level in the last level)
      - level_compaction_dynamic_level_bytes ??
      
      We can handle each of these cases individually, but I felt it's safer to calculate max_level_with_files again if we want to do a ReFitLevel
      
      Test Plan:
      adding some tests
      make -j64 check
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: ott, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39663
      73faa3d4
  10. 10 6月, 2015 4 次提交
    • R
      C: add WriteBatch.PutLogData support · 735df665
      Reed Allman 提交于
      735df665
    • S
      Make "make all" work for CYGWIN · e409d3d7
      sdong 提交于
      Summary: Some test and benchmark codes don't build for CYGWIN. Fix it.
      
      Test Plan: Build "make all" with TARGET_OS=Cygwin on cygwin and make sure it passes.
      
      Reviewers: rven, yhchiang, anthony, igor, kradhakrishnan
      
      Reviewed By: igor, kradhakrishnan
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39711
      e409d3d7
    • S
      Print info message about files need compaction for debuging purpose · 75d7075a
      sdong 提交于
      Summary:
      When there are files marked for compaction after compactions, print extra messages to help debugging. Example:
      
      2015/06/08-23:12:55.212855 7ff5013ff700 [default] [JOB 121] Generated table #75: 54 keys, 4807 bytes (need compaction)
      
      2015/06/08-23:12:55.556194 7ff5013ff700 (Original Log Time 2015/06/08-23:12:55.556160) [default] compacted to: base level 1 max bytes base
      10240 files[0 1 9 32 12 0 0 0] max score 0.96 (2 files need compaction), MB/sec: 0.0 rd, 0.1 wr, level 2, files in(1, 3) out(5) MB in(0.0,
      0.0) out(0.0), read-write-amplify(11.3) write-amplify(5.7) OK, records in: 40, records dropped: 0
      
      Test Plan:
      Run test and see LOG files.
      
      valgrind test DBTest.TablePropertiesNeedCompactTest
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: yoshinorim, maykov, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39771
      75d7075a
    • V
      Fix hang when closing a DB after doing loads with WAL disabled. · 406a5682
      Venkatesh Radhakrishnan 提交于
      Summary:
      There is a hang during DB close in the following scenario:
      a) a load with WAL disabled was done,
      b) CancelAllBackgroundWork was called,
      c) DB Close was called
      This was because in that we will wait for a flush but we cannot do a
      background flush because we have called CancelAllBackgroundWork which
      marks the DB as shutting downn.
      
      Test Plan: Added DBTest FlushOnDestroy
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, hermanlee4, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39747
      406a5682
  11. 09 6月, 2015 3 次提交
    • S
      GetSnapshot() and ReleaseSnapshot() to move new and free out of DB mutex · d8c8f08c
      sdong 提交于
      Summary: We currently issue malloc and free inside DB mutex in GetSnapshot() and ReleaseSnapshot(). Move them out.
      
      Test Plan:
      Go through all tests
      make valgrind_check
      
      Reviewers: yhchiang, rven, IslamAbdelRahman, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, hermanlee4, MarkCallaghan, yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39753
      d8c8f08c
    • I
      Use nullptr for default compaction_filter_factory · 643bbbf0
      Islam AbdelRahman 提交于
      Summary:
      Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2
      The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI
      
      Test Plan: make check
      
      Reviewers: yoshinorim, ott, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D39693
      643bbbf0
    • I
      Fix ASAN errors in c_test · f02ce0c6
      Igor Canadi 提交于
      Summary: key_sizes claims that 3rd key is of length 8, but it's really only 3. This diff makes it length 8.
      
      Test Plan: asan c_test works again.
      
      Reviewers: sdong, yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39699
      f02ce0c6
  12. 06 6月, 2015 5 次提交
  13. 05 6月, 2015 2 次提交
    • I
      Fix compile · b2785472
      Igor Canadi 提交于
      Summary:
      This commit broke the compile: https://github.com/facebook/rocksdb/commit/3ce3bb3da2486c2c18a332128dda7c05a91abb85
      As evidenced here: https://evergreen.mongodb.com/task/mongodb_mongo_master_ubuntu1404_rocksdb_compile_ce2b1d11d42de93f7b375f7e6c41fb709f66e969_15_06_04_23_09_36
      
      This should fix it
      
      Test Plan: make check
      
      Reviewers: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D39627
      b2785472
    • I
      Allowing L0 -> L1 trivial move on sorted data · 3ce3bb3d
      Islam AbdelRahman 提交于
      Summary:
      This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping
      
      The conditions for trivial move have been updated
      
      Introduced conditions:
        - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual)
        - Input level files cannot be overlapping
      
      Removed conditions:
        - Trivial move only run when the compaction is not manual
        - Input level should can contain only 1 file
      
      More context on what tests failed because of Trivial move
      ```
      DBTest.CompactionsGenerateMultipleFiles
      This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1
      ```
      
      ```
      DBTest.NoSpaceCompactRange
      This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation
      because trivial move does not need any extra space, and did not check for that
      ```
      
      ```
      DBTest.DropWrites
      Similar to DBTest.NoSpaceCompactRange
      ```
      
      ```
      DBTest.DeleteObsoleteFilesPendingOutputs
      This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3
      ```
      
      ```
      CuckooTableDBTest.CompactionIntoMultipleFiles
      Same as DBTest.CompactionsGenerateMultipleFiles
      ```
      
      This diff is based on a work by @sdong https://reviews.facebook.net/D34149
      
      Test Plan: make -j64 check
      
      Reviewers: rven, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yhchiang, ott, march, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D34797
      3ce3bb3d