1. 19 3月, 2014 2 次提交
  2. 18 3月, 2014 8 次提交
    • I
      Optimize fallocation · f26cb0f0
      Igor Canadi 提交于
      Summary:
      Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads.
      
      This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x.
      
      At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported.
      
      Test Plan: make check
      
      Reviewers: dhruba, sdong, haobo, ljin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16761
      f26cb0f0
    • I
      Fix race condition in manifest roll · ae25742a
      Igor Canadi 提交于
      Summary:
      When the manifest is getting rolled the following happens:
      1) manifest_file_number_ is assigned to a new manifest number (even though the old one is still current)
      2) mutex is unlocked
      3) SetCurrentFile() creates temporary file manifest_file_number_.dbtmp
      4) SetCurrentFile() renames manifest_file_number_.dbtmp to CURRENT
      5) mutex is locked
      
      If FindObsoleteFiles happens between (3) and (4) it will:
      1) Delete manifest_file_number_.dbtmp (because it's not in pending_outputs_)
      2) Delete old manifest (because the manifest_file_number_ already points to a new one)
      
      I introduce the concept of prev_manifest_file_number_ that will avoid the race condition.
      
      However, we should discuss the future of MANIFEST file rolling. We found some race conditions with it last week and who knows how many more are there. Nobody is using it in production because we don't trust the implementation. Should we even support it?
      
      Test Plan: make check
      
      Reviewers: ljin, dhruba, haobo, sdong
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16929
      ae25742a
    • I
      Check starts_with(prefix) in MultiPrefixIterate · 5601bc46
      Igor Canadi 提交于
      Summary: We switched to prefix_seek method of seeking. This means that anytime we check Valid(), we also need to check starts_with(prefix)
      
      Test Plan: ran db_stress
      
      Reviewers: ljin
      
      Reviewed By: ljin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16953
      5601bc46
    • I
      keep_log_files option in BackupableDB · 9caeff51
      Igor Canadi 提交于
      Summary:
      Added an option to BackupableDB implementation that allows users to persist in-memory databases. When the restore happens with keep_log_files = true, it will
      *) Not delete existing log files in wal_dir
      *) Move log files from archive directory to wal_dir, so that DB can replay them if necessary
      
      Test Plan: Added an unit test
      
      Reviewers: dhruba, ljin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16941
      9caeff51
    • Y
      Correct the logic of MemTable::ShouldFlushNow(). · a5fafd4f
      Yueh-Hsuan Chiang 提交于
      Summary:
      Memtable will now be forced to flush if the one of the following
      conditions is met:
      1. Already allocated more than write_buffer_size + 60% arena block size.
         (the overflowing condition)
      2. Unable to safely allocate one more arena block without hitting the
         overflowing condition AND the unused allocated memory < 25% arena
         block size.
      
      Test Plan: make all check
      
      Reviewers: sdong, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16893
      a5fafd4f
    • I
      No prefix iterator in db_stress · 9b8a2b52
      Igor Canadi 提交于
      Summary: We're trying to deprecate prefix iterators, so no need to test them in db_stress
      
      Test Plan: ran it
      
      Reviewers: ljin
      
      Reviewed By: ljin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16917
      9b8a2b52
    • S
      Fix a bug that Prev() can hang. · c61c9830
      sdong 提交于
      Summary: Prev() now can hang when there is a key with more than max_skipped number of appearance internally but all of them are newer than the sequence ID to seek. Add unit tests to confirm the bug and fix it.
      
      Test Plan: make all check
      
      Reviewers: igor, haobo
      
      Reviewed By: igor
      
      CC: ljin, yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16899
      c61c9830
    • I
      Don't care about signed/unsigned compare · f9d05302
      Igor Canadi 提交于
      Summary:
      We need to stop these:
      https://github.com/facebook/rocksdb/pull/99
      https://github.com/facebook/rocksdb/pull/83
      
      Test Plan: no
      
      Reviewers: dhruba, haobo, sdong, ljin, yhchiang
      
      Reviewed By: ljin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16905
      f9d05302
  3. 17 3月, 2014 1 次提交
  4. 16 3月, 2014 1 次提交
  5. 15 3月, 2014 9 次提交
    • L
      journal log_number correctly in MANIFEST · 453ec52c
      Lei Jin 提交于
      Summary:
      Here is what it can cause probelm:
      There is one memtable flush and one compaction. Both call LogAndApply(). If both edits are applied in the same batch with flush edit first and the compaction edit followed. LogAndApplyHelper() will assign compaction edit current VersionSet's log number(which should be smaller than the log number from flush edit). It cause log_numbers in MANIFEST to be not monotonic increasing, which violates the assume Recover() makes. What is more is after comitting to MANIFEST file, log_number_ in VersionSet is updated to the log_number from the last edit, which is the compaction one. It ends up not updating the log_number.
      
      Test Plan:
      make whitebox_crash_test
      got another assertion about iter->valid(), not sure if that is related
      to this.
      
      Reviewers: igor, haobo
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16875
      453ec52c
    • C
      Breaking line · f234dfd8
      Caio SBA 提交于
      f234dfd8
    • C
      Make it compile on Debian/GCC 4.7 · b9c78d2d
      Caio SBA 提交于
      b9c78d2d
    • I
      Merge pull request #97 from agchou/patch-1 · 5948a663
      Igor Canadi 提交于
      Fix copyright year
      5948a663
    • I
      Missing includes · 2bad3cb0
      Igor Canadi 提交于
      2bad3cb0
    • I
      unterminated conditional directive · 56dce9bf
      Igor Canadi 提交于
      56dce9bf
    • I
      Fix another Mac OS warning · f74659ac
      Igor Canadi 提交于
      f74659ac
    • I
      Fix HashSkipList and HashLinkedList SIGSEGV · 3c75cc15
      Igor Canadi 提交于
      Summary:
      Original Summary:
      Yesterday, @ljin and I were debugging various db_stress issues. We suspected one of them happens when we concurrently call NewIterator without prefix_seek on HashSkipList. This test demonstrates it.
      
      Update:
      Arena is not thread-safe!! When creating a new full iterator, we *have* to create a new arena, otherwise we're doomed.
      
      Test Plan: SIGSEGV and assertion-throwing test now works!
      
      Reviewers: ljin, haobo, sdong
      
      Reviewed By: sdong
      
      CC: leveldb, ljin
      
      Differential Revision: https://reviews.facebook.net/D16857
      3c75cc15
    • I
      Fix warning on Mac OS · 6c72079d
      Igor Canadi 提交于
      6c72079d
  6. 14 3月, 2014 1 次提交
    • S
      Fix extra compaction tasks scheduled after D16767 in some cases · 5aa81f04
      sdong 提交于
      Summary:
      With D16767, there is a case compaction tasks are scheduled infinitely:
      (1) no flush thread is configured and more than 1 compaction threads
      (2) a flush is going on by one compaction hread
      (3) the state of SST files is in the state that versions_->current()->NeedsCompaction() will generate a false positive (return true actually there is no work to be done)
      In that case, a infinite loop will be formed.
      
      This patch would fix it.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: igor
      
      CC: dhruba, yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16863
      5aa81f04
  7. 13 3月, 2014 9 次提交
  8. 12 3月, 2014 7 次提交
    • S
      Fix bad merge of D16791 and D16767 · 839c8ecf
      sdong 提交于
      Summary: A bad Auto-Merge caused log buffer is flushed twice. Remove the unintended one.
      
      Test Plan: Should already be tested (the code looks the same as when I ran unit tests).
      
      Reviewers: haobo, igor
      
      Reviewed By: haobo
      
      CC: ljin, yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16821
      839c8ecf
    • L
      make assert based on FLAGS_prefix_size · 86ba3e24
      Lei Jin 提交于
      Summary: as title
      
      Test Plan: running python tools/db_crashtest.py
      
      Reviewers: igor
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16803
      86ba3e24
    • S
      Fix data race against logging data structure because of LogBuffer · bd45633b
      sdong 提交于
      Summary:
      @igor pointed out that there is a potential data race because of the way we use the newly introduced LogBuffer. After "bg_compaction_scheduled_--" or "bg_flush_scheduled_--", they can both become 0. As soon as the lock is released after that, DBImpl's deconstructor can go ahead and deconstruct all the states inside DB, including the info_log object hold in a shared pointer of the options object it keeps. At that point it is not safe anymore to continue using the info logger to write the delayed logs.
      
      With the patch, lock is released temporarily for log buffer to be flushed before "bg_compaction_scheduled_--" or "bg_flush_scheduled_--". In order to make sure we don't miss any pending flush or compaction, a new flag bg_schedule_needed_ is added, which is set to be true if there is a pending flush or compaction but not scheduled because of the max thread limit. If the flag is set to be true, the scheduling function will be called before compaction or flush thread finishes.
      
      Thanks @igor for this finding!
      
      Test Plan: make all check
      
      Reviewers: haobo, igor
      
      Reviewed By: haobo
      
      CC: dhruba, ljin, yhchiang, igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16767
      bd45633b
    • L
      fix db_stress test · 02dab3be
      Lei Jin 提交于
      Summary: Fix the db_stress test, let is run with HashSkipList for real
      
      Test Plan:
      python tools/db_crashtest.py
      python tools/db_crashtest2.py
      
      Reviewers: igor, haobo
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16773
      02dab3be
    • S
      Temp Fix of LogBuffer flushing · 6c66bc08
      sdong 提交于
      Summary: To temp fix the log buffer flushing. Flush the buffer inside the lock. Clean the trunk before we find an eventual fix.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor
      
      Reviewed By: igor
      
      CC: ljin, leveldb, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D16791
      6c66bc08
    • I
      Add a comment after SignalAll() · cb980216
      Igor Canadi 提交于
      Summary: Having code after SignalAll has already caused 2 bugs. Let's make sure this doesn't happen again.
      
      Test Plan: no test
      
      Reviewers: sdong, dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16785
      cb980216
    • S
      Env to add a function to allow users to query waiting queue length · 01dcef11
      sdong 提交于
      Summary: Add a function to Env so that users can query the waiting queue length of each thread pool
      
      Test Plan: add a test in env_test
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: dhruba, igor, yhchiang, ljin, nkg-, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16755
      01dcef11
  9. 11 3月, 2014 2 次提交