1. 05 3月, 2016 1 次提交
    • Y
      Fix a bug where flush does not happen when a manual compaction is running · a7d4eb2f
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, when rocksdb tries to run manual compaction to refit data into a level,
      there's a ReFitLevel() process that requires no bg work is currently running.
      When RocksDB plans to ReFitLevel(), it will do the following:
      
       1. pause scheduling new bg work.
       2. wait until all bg work finished
       3. do the ReFitLevel()
       4. unpause scheduling new bg work.
      
      However, as it pause scheduling new bg work at step one and waiting for all bg work
      finished in step 2, RocksDB will stop flushing until all bg work is done (which
      could take a long time.)
      
      This patch fix this issue by changing the way ReFitLevel() pause the background work:
      
      1. pause scheduling compaction.
      2. wait until all bg work finished.
      3. pause scheduling flush
      4. do ReFitLevel()
      5. unpause both flush and compaction.
      
      The major difference is that.  We only pause scheduling compaction in step 1 and wait
      for all bg work finished in step 2.  This prevent flush being blocked for a long time.
      Although there's a very rare case that ReFitLevel() might be in starvation in step 2,
      but it's less likely the case as flush typically finish very fast.
      
      Test Plan: existing test.
      
      Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55029
      a7d4eb2f
  2. 04 3月, 2016 1 次提交
  3. 03 3月, 2016 3 次提交
  4. 02 3月, 2016 3 次提交
  5. 01 3月, 2016 3 次提交
    • S
      Recompute compaction score after scheduling manual compaction · b5b1db16
      sdong 提交于
      Summary: After we made manual compaction runnable concurrently with automaticallly compaction, we need to run ComputeCompactionScore() to prepare a coming compaction picking call before the compaction finishes.
      
      Test Plan: Run existing tests.
      
      Reviewers: yhchiang, IslamAbdelRahman, andrewkr, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54891
      b5b1db16
    • S
      Introduce Iterator::GetProperty() and replace Iterator::IsKeyPinned() · 1f595414
      sdong 提交于
      Summary:
      Add Iterator::GetProperty(), a way for users to communicate with iterator, and turn Iterator::IsKeyPinned() with it.
      As a follow-up, I'll ask a property as the version number attached to the iterator
      
      Test Plan: Rerun existing tests and add a negative test case.
      
      Reviewers: yhchiang, andrewkr, kradhakrishnan, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54783
      1f595414
    • A
      Handle concurrent manifest update and backup creation · 69c471bd
      Andrew Kryczka 提交于
      Summary:
      Fixed two related race conditions in backup creation.
      
      (1) CreateNewBackup() uses DB::DisableFileDeletions() to prevent table files
      from being deleted while it is copying; however, the MANIFEST file could still
      rotate during this time. The fix is to stop deleting the old manifest in the
      rotation logic. It will be deleted safely later when PurgeObsoleteFiles() runs
      (can only happen when file deletions are enabled).
      
      (2) CreateNewBackup() did not account for the CURRENT file being mutable.
      This is significant because the files returned by GetLiveFiles() contain a
      particular manifest filename, but the manifest to which CURRENT refers can
      change at any time. This causes problems when CURRENT changes between the call
      to GetLiveFiles() and when it's copied to the backup directory. To workaround this, I
      manually forge a CURRENT file referring to the manifest filename returned in
      GetLiveFiles().
      
      (2) also applies to the checkpointing code, so let me know if this approach is
      good and I'll make the same change there.
      
      Test Plan:
      new test for roll manifest during backup creation.
      
      running the test before this change:
      
        $ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
        ...
        IO error: /tmp/rocksdbtest-9383/backupable_db/MANIFEST-000001: No such file or directory
      
      running the test after this change:
      
        $ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
        ...
        [ RUN      ] BackupableDBTest.ChangeManifestDuringBackupCreation
        [       OK ] BackupableDBTest.ChangeManifestDuringBackupCreation (2836 ms)
      
      Reviewers: IslamAbdelRahman, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D54711
      69c471bd
  6. 27 2月, 2016 1 次提交
    • S
      Make DBTestUniversalCompaction.IncreaseUniversalCompactionNumLevels more robust · 8800975f
      sdong 提交于
      Summary:
      Based on thread scheduling, DBTestUniversalCompaction.IncreaseUniversalCompactionNumLevels can fail to flush enough files to trigger expected compactions. Fix it by waiting for flush after inserting each key.
      There are failrue reported:
      
      db/db_universal_compaction_test.cc:1134: Failure
      Expected: (NumTableFilesAtLevel(options.num_levels - 1, 1)) > (0), actual: 0 vs 0
      
      but I can't repro it. Try to fix the bug and see whether it goes away.
      
      Test Plan: Run the test multiple time.
      
      Reviewers: IslamAbdelRahman, anthony, andrewkr, kradhakrishnan, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54747
      8800975f
  7. 25 2月, 2016 1 次提交
  8. 24 2月, 2016 1 次提交
    • S
      Fix assert failure when DBImpl::SyncWAL() conflicts with log rolling · 38201b35
      sdong 提交于
      Summary: DBImpl::SyncWAL() releases db mutex before calling DBImpl::MarkLogsSynced(), while inside DBImpl::MarkLogsSynced() we assert there is none or one outstanding log file. However, a memtable switch can happen in between and causing two or outstanding logs there, failing the assert. The diff adds a unit test that repros the issue and fix the assert so that the unit test passes.
      
      Test Plan: Run the new tests.
      
      Reviewers: anthony, kolmike, yhchiang, IslamAbdelRahman, kradhakrishnan, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54621
      38201b35
  9. 23 2月, 2016 2 次提交
    • A
      Redo SyncPoints for flush while rolling test · b0469166
      Andrew Kryczka 提交于
      Summary:
      There was a race condition in the test where the rolling thread
      acquired the mutex before the flush thread pinned the logger. Rather than add
      more complicated synchronization to fix it, I followed Siying's suggestion to
      use SyncPoint in the test code.
      
      Comments in the LoadDependency() invocation explain the reason for each of the
      sync points.
      
      Test Plan:
      Ran test 1000 times for tsan/asan. Will wait for all sandcastle tests
      to finish before committing since this is a tricky test.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D54615
      b0469166
    • M
      Fixed CompactFiles() spuriously failing or corrupting DB · eef63ef8
      Mike Kolupaev 提交于
      Summary:
      We started getting two kinds of crashes since we started using `DB::CompactFiles()`:
      (1) `CompactFiles()` fails saying something like "/data/logdevice/4440/shard12/012302.sst: No such file or directory", and presumably makes DB read-only,
      (2) DB fails to open saying "Corruption: Can't access /267000.sst: IO error: /data/logdevice/4440/shard1/267000.sst: No such file or directory".
      
      AFAICT, both can be explained by background thread deleting compaction output as "obsolete" while it's being written, before it's committed to manifest. If it ends up committed to the manifest, we get (2); if compaction notices the disappearance and fails, we get (1). The internal tasks t10068021 and t10134177 have some details about the investigation that led to this.
      
      Test Plan: `make -j check`; the new test fails to reopen the DB without the fix
      
      Reviewers: yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D54561
      eef63ef8
  10. 20 2月, 2016 1 次提交
  11. 19 2月, 2016 1 次提交
    • A
      Use condition variable in log roller test · d825fc70
      Andrew Kryczka 提交于
      Summary:
      Previously I just slept until the flush_thread was "probably" ready
      since proper synchronization in test cases seemed like overkill. But then tsan
      complained about it, so I did the synchronization (mostly) properly now.
      
      Test Plan:
        $ COMPILE_WITH_TSAN=1 make -j32 auto_roll_logger_test
        $ ./auto_roll_logger_test
      
      Reviewers: anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D54399
      d825fc70
  12. 18 2月, 2016 2 次提交
    • I
      Introduce SstFileManager::SetMaxAllowedSpaceUsage() to cap disk space usage · df9ba6df
      Islam AbdelRahman 提交于
      Summary:
      Introude SstFileManager::SetMaxAllowedSpaceUsage() that can be used to limit the maximum space usage allowed for RocksDB.
      When this limit is exceeded WriteImpl() will fail and return Status::Aborted()
      
      Test Plan: unit testing
      
      Reviewers: yhchiang, anthony, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D53763
      df9ba6df
    • A
      Fix race conditions in auto-rolling logger · 3943d167
      Andrew Kryczka 提交于
      Summary:
      For GetLogFileSize() and Flush(), they previously did not follow the
      synchronization pattern for accessing logger_. This meant ResetLogger() could
      cause logger_ destruction while the unsynchronized functions were accessing it,
      causing a segfault.
      
      Also made the mutex instance variable mutable so we can preserve
      GetLogFileSize()'s const-ness.
      
      Test Plan:
      new test case, it's quite ugly because both threads need to access
      one of the functions with SyncPoints (PosixLogger::Flush()), and also special
      handling is needed to prevent the mutex and sync points from conflicting.
      
      Reviewers: kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D54237
      3943d167
  13. 17 2月, 2016 3 次提交
    • R
      Improve write_with_callback_test to sync WAL · a7b6f074
      reid horuff 提交于
      Summary: Currently write_with_callback_test does not test with WAL syncing enabled. This addresses that.
      
      Test Plan: write_with_callback_test
      
      Reviewers: anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D54255
      a7b6f074
    • R
      Fix WriteImpl empty batch hanging issue · 5bcf952a
      reid horuff 提交于
      Summary: There is an issue in DBImpl::WriteImpl where if an empty writebatch comes in and sync=true then the logs will be marked as being synced yet the sync never actually happens because there is no data in the writebatch. This causes the next incoming batch to hang while waiting for the logs to complete syncing. This fix syncs logs even if the writebatch is empty.
      
      Test Plan: DoubleEmptyBatch unit test in transaction_test.
      
      Reviewers: yoshinorim, hermanlee4, sdong, ngbronson, anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54057
      5bcf952a
    • M
      Fixed a segfault when compaction fails · 44371501
      Mike Kolupaev 提交于
      Summary: We've hit it today.
      
      Test Plan: `make -j check`; didn't reproduce the issue
      
      Reviewers: yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D54219
      44371501
  14. 16 2月, 2016 1 次提交
    • J
      Separeate main from bench functionality to allow cusomizations · 7bd284c3
      Jonathan Wiepert 提交于
      Summary: Isolate db_bench functionality from main so custom benchmark code can be written and managed
      
      Test Plan:
      Tested commands
      ./build_tools/regression_build_test.sh
      ./db_bench --db=/tmp/rocksdbtest-12321/dbbench --stats_interval_seconds=1 --num=1000
      ./db_bench --db=/tmp/rocksdbtest-12321/dbbench --stats_interval_seconds=1 --num=1000 --reads=500 --writes=500
      ./db_bench --db=/tmp/rocksdbtest-12321/dbbench --stats_interval_seconds=1 --num=1000 --merge_keys=100 --numdistinct=100 --num_column_families=3 --num_hot_column_families=1
      ./db_bench --stats_interval_seconds=1 --num=1000 --bloom_locality=1 --seed=5 --threads=5
      ./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --usee_uint64_comparator=true --batch-size=5
      ./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --use_uint64_comparator=true --batch_size=5
      ./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --usee_uint64_comparator=true --batch-size=5
      
      Test Results - https://phabricator.fb.com/P56130387
      
      Additional tests for:
      ./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --use_uint64_comparator=true --batch_size=5 --key_size=8 --merge_operator=put
      ./db_bench --stats_interval_seconds=1 --num=1000 --bloom_locality=1 --seed=5 --threads=5 --merge_operator=uint64add
      
      Results: https://phabricator.fb.com/P56130607
      
      Reviewers: yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D53991
      7bd284c3
  15. 12 2月, 2016 1 次提交
    • S
      Add a new compaction priority that picks file whose overlapping ratio is smallest · 92a9ccf1
      sdong 提交于
      Summary:
      Add a new compaction priority as following:
      For every file, we calculate total size of files overalapping with the file in the next level, over the file's size itself. The file with smallest ratio will be picked first.
      My "db_bench --fillrandom" shows about 5% less compaction than kOldestSmallestSeqFirst if --hard_pending_compaction_bytes_limit value to keep LSM tree in shape. If not limiting hard_pending_compaction_bytes_limit, improvement is only 1% or 2%.
      
      Test Plan: Add a unit test
      
      Reviewers: andrewkr, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54075
      92a9ccf1
  16. 11 2月, 2016 1 次提交
  17. 10 2月, 2016 2 次提交
  18. 06 2月, 2016 4 次提交
    • S
      Fix LITE db_test build broken by previous commit · a76e9093
      sdong 提交于
      Summary: Previous commit introduces a test that is not supported in LITE. Fix it.
      
      Test Plan: Build the test with ROCKSDB_LITE.
      
      Reviewers: kradhakrishnan, IslamAbdelRahman, anthony, yhchiang, andrewkr
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D53901
      a76e9093
    • S
      Explictly fail when memtable doesn't support concurrent insert · b1887c5d
      sdong 提交于
      Summary: If users turn on concurrent insert but the memtable doesn't support it, they might see unexcepted crash. Fix it by explicitly fail.
      
      Test Plan:
      Run different setting of stress_test and make sure it fails correctly.
      Will add a unit test too.
      
      Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, andrewkr, ngbronson
      
      Reviewed By: ngbronson
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D53895
      b1887c5d
    • R
      Improve perf of Pessimistic Transaction expirations (and optimistic transactions) · 6f71d3b6
      reid horuff 提交于
      Summary:
      copy from task 8196669:
      
      1) Optimistic transactions do not support batching writes from different threads.
      2) Pessimistic transactions do not support batching writes if an expiration time is set.
      
      In these 2 cases, we currently do not do any write batching in DBImpl::WriteImpl() because there is a WriteCallback that could decide at the last minute to abort the write.  But we could support batching write operations with callbacks if we make sure to process the callbacks correctly.
      
      To do this, we would first need to modify write_thread.cc to stop preventing writes with callbacks from being batched together.  Then we would need to change DBImpl::WriteImpl() to call all WriteCallback's in a batch, only write the batches that succeed, and correctly set the state of each batch's WriteThread::Writer.
      
      Test Plan: Added test WriteWithCallbackTest to write_callback_test.cc which creates multiple client threads and verifies that writes are batched and executed properly.
      
      Reviewers: hermanlee4, anthony, ngbronson
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52863
      6f71d3b6
    • I
      Add BlockBasedTableOptions::index_block_restart_interval · 8e6172bc
      Islam AbdelRahman 提交于
      Summary: Add a new option to BlockBasedTableOptions that will allow us to change the restart interval for the index block
      
      Test Plan: unit tests
      
      Reviewers: yhchiang, anthony, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: march, dhruba
      
      Differential Revision: https://reviews.facebook.net/D53721
      8e6172bc
  19. 04 2月, 2016 1 次提交
    • N
      always invalidate sequential-insertion cache for concurrent skiplist adds · 2c1db5ea
      Nathan Bronson 提交于
      Summary:
      InlineSkipList::InsertConcurrently should invalidate the
      sequential-insertion cache prev_[] for all inserts of multi-level nodes,
      not just those that increase the height of the skip list.  The invariant
      for prev_ is that prev_[i] (i > 0) is supposed to be the predecessor of
      prev_[0] at level i.  Before this diff InsertConcurrently could violate
      this constraint when inserting a multi-level node after prev_[i] but
      before prev_[0].
      
      This diff also reenables kConcurrentSkipList as db_test's
      MultiThreaded/MultiThreadedDBTest.MultiThreaded/29.
      
      Test Plan:
      1. unit tests
      2. temporarily hack kConcurrentSkipList timing so that it is fast but has a 1.5% failure rate on my dev box (1ms stagger on thread launch, 1s test duration, failure rate baseline over 1000 runs)
      3. observe 1000 passes post-fix
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, dhruba
      
      Differential Revision: https://reviews.facebook.net/D53751
      2c1db5ea
  20. 03 2月, 2016 2 次提交
    • A
      Eliminate duplicated property constants · 284aa613
      Andrew Kryczka 提交于
      Summary:
      Before this diff, there were duplicated constants to refer to properties (user-
      facing API had strings and InternalStats had an enum). I noticed these were
      inconsistent in terms of which constants are provided, names of constants, and
      documentation of constants. Overall it seemed annoying/error-prone to maintain
      these duplicated constants.
      
      So, this diff gets rid of InternalStats's constants and replaces them with a map
      keyed on the user-facing constant. The value in that map contains a function
      pointer to get the property value, so we don't need to do string matching while
      holding db->mutex_. This approach has a side benefit of making many small
      handler functions rather than a giant switch-statement.
      
      Test Plan: db_properties_test passes, running "make commit-prereq -j32"
      
      Reviewers: sdong, yhchiang, kradhakrishnan, IslamAbdelRahman, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D53253
      284aa613
    • N
      disable kConcurrentSkipList multithreaded test · 5fcd1ba3
      Nathan Bronson 提交于
      Summary: Disable test that is intermittently failing
      
      Test Plan: unit tests
      
      Reviewers: igor, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D53715
      5fcd1ba3
  21. 02 2月, 2016 5 次提交