1. 16 3月, 2014 1 次提交
  2. 15 3月, 2014 1 次提交
    • L
      journal log_number correctly in MANIFEST · 453ec52c
      Lei Jin 提交于
      Summary:
      Here is what it can cause probelm:
      There is one memtable flush and one compaction. Both call LogAndApply(). If both edits are applied in the same batch with flush edit first and the compaction edit followed. LogAndApplyHelper() will assign compaction edit current VersionSet's log number(which should be smaller than the log number from flush edit). It cause log_numbers in MANIFEST to be not monotonic increasing, which violates the assume Recover() makes. What is more is after comitting to MANIFEST file, log_number_ in VersionSet is updated to the log_number from the last edit, which is the compaction one. It ends up not updating the log_number.
      
      Test Plan:
      make whitebox_crash_test
      got another assertion about iter->valid(), not sure if that is related
      to this.
      
      Reviewers: igor, haobo
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16875
      453ec52c
  3. 11 3月, 2014 1 次提交
  4. 06 3月, 2014 1 次提交
    • S
      Buffer info logs when picking compactions and write them out after releasing the mutex · ecb1ffa2
      sdong 提交于
      Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex.
      
      Test Plan:
      make all check
      check the log lines while running some tests that trigger compactions.
      
      Reviewers: haobo, igor, dhruba
      
      Reviewed By: dhruba
      
      CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg-
      
      Differential Revision: https://reviews.facebook.net/D16515
      ecb1ffa2
  5. 04 3月, 2014 1 次提交
    • I
      Fix a group commit bug in LogAndApply · 5142b370
      Igor Canadi 提交于
      Summary:
      EncodeTo(&record) does not overwrite, it appends to it.
      
      This means that group commit log and apply will look something like:
      record1
      record1record2
      record1record2record3
      
      I'm surprised this didn't show up in production, but I think the reason is that MANIFEST group commit almost never happens.
      
      This bug turned up in column family work, where opening a database failed with "adding a same column family twice".
      
      Test Plan: Tested the change in column family branch and observed that the problem is gone (with db_stress)
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16461
      5142b370
  6. 01 3月, 2014 2 次提交
    • K
      Remove the terrible hack in for flush_block_policy_factory · bf86af51
      kailiu 提交于
      Summary:
      Previous code is too convoluted and I must be drunk for letting
      such code to be written without a second thought.
      
      Thanks to the discussion with @sdong, I added the `Options` when
      generating the flusher, thus avoiding the tricks.
      
      Just FYI: I resisted to add Options in flush_block_policy.h since I
      wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options
      and Options also depends FlushBlockPolicy... While I appreciate my
      effort to prevent it, the old design turns out creating more troubles than
      it tried to avoid.
      
      Test Plan: ran ./table_test
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      CC: sdong, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16503
      bf86af51
    • I
      Make Log::Reader more robust · 58ca641d
      Igor Canadi 提交于
      Summary:
      This diff does two things:
      (1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#
      (2) Turn off mmap writes for all writes to log and manifest files
      
      (2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing.
      
      Test Plan:
      Added unit tests from LevelDB
      Actually recovered a "corrupted" MANIFEST file.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16119
      58ca641d
  7. 14 2月, 2014 1 次提交
  8. 13 2月, 2014 1 次提交
  9. 04 2月, 2014 1 次提交
  10. 03 2月, 2014 1 次提交
  11. 31 1月, 2014 1 次提交
  12. 29 1月, 2014 1 次提交
    • I
      Only get the manifest file size if there is no error · 5d2c6282
      Igor Canadi 提交于
      Summary:
      I came across this while working on column families. CorruptionTest::RecoverWriteError threw a SIGSEG because the descriptor_log_->file() was nullptr. I'm not sure why it doesn't happen in master, but better safe than sorry.
      
      @kailiu, can we get this in release, too?
      
      Test Plan: make check
      
      Reviewers: kailiu, dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb, kailiu
      
      Differential Revision: https://reviews.facebook.net/D15513
      5d2c6282
  13. 28 1月, 2014 2 次提交
    • I
      Fsync directory after we create a new file · 832158e7
      Igor Canadi 提交于
      Summary:
      @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder.
      
      Should I also add FsyncDir() to new files that get created by a compaction?
      
      Test Plan: Confirmed that FsyncDir is returning Status::OK()
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D14751
      832158e7
    • I
      Move NeedsCompaction() from VersionSet to Version · 6c2ca1d3
      Igor Canadi 提交于
      Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet.
      
      Test Plan: make check
      
      Reviewers: kailiu, haobo, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15333
      6c2ca1d3
  14. 25 1月, 2014 2 次提交
    • I
      Fix reduce levels · 04afa321
      Igor Canadi 提交于
      ReduceNumberOfLevels had segmentation fault in WriteSnapshot() since we
      didn't change the number of levels in VersionSet (we consider them
      immutable from now on). This fixes the problem.
      04afa321
    • I
      Make VersionSet::ReduceNumberOfLevels() static · 677fee27
      Igor Canadi 提交于
      Summary:
      A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method.
      
      Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file.
      
      Test Plan: reduce_levels_test
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15303
      677fee27
  15. 23 1月, 2014 1 次提交
    • I
      Unfriending classes · fb01755a
      Igor Canadi 提交于
      Summary:
      In this diff I made some effort to reduce usage of friending. To do that, I had to expose Compaction::inputs_ through a method inputs(). Not sure if this is a good idea, there is a trade-off. I think it's less confusing than having lots of friends.
      
      I also thought about other friendship relationships, but they are too much tangled at this point. Once you friend two classes, it's very hard to unfriend them :)
      
      Test Plan: make check
      
      Reviewers: haobo, kailiu, sdong, dhruba
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15267
      fb01755a
  16. 17 1月, 2014 2 次提交
    • I
      Remove compaction pointers · 6d6fb709
      Igor Canadi 提交于
      Summary: The only thing we do with compaction pointers is set them to some values, we never actually read them. I don't know what we used them for, but it doesn't look like we use them anymore.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15225
      6d6fb709
    • I
      CompactionPicker · c699c84a
      Igor Canadi 提交于
      Summary:
      This is a big one. This diff moves all the code related to picking compactions from VersionSet to new class CompactionPicker. Column families' compactions will be completely separate processes, so we need to have multiple CompactionPickers.
      
      To make this easier to review, most of the code change is just copy/paste. There is also a small change not to use VersionSet::current_, but rather to take `Version* version` as a parameter. Most of the other code is exactly the same.
      
      In future diffs, I will also make some improvements to CompactionPickers. I think the most important part will be encapsulating it better. Currently Version, VersionSet, Compaction and CompactionPicker are all friend classes, which makes it harder to change the implementation.
      
      This diff depends on D15171, D15183, D15189 and D15201
      
      Test Plan: `make check`
      
      Reviewers: kailiu, sdong, dhruba, haobo
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15207
      c699c84a
  17. 16 1月, 2014 5 次提交
    • K
      Remove the unnecessary use of shared_ptr · eae1804f
      kailiu 提交于
      Summary:
      shared_ptr is slower than unique_ptr (which literally comes with no performance cost compare with raw pointers).
      In memtable and memtable rep, we use shared_ptr when we'd actually should use unique_ptr.
      
      According to igor's previous work, we are likely to make quite some performance gain from this diff.
      
      Test Plan: make check
      
      Reviewers: dhruba, igor, sdong, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15213
      eae1804f
    • I
      Move more functions from VersionSet to Version · 787f11bb
      Igor Canadi 提交于
      Summary:
      This moves functions:
      * VersionSet::Finalize() -> Version::UpdateCompactionStats()
      * VersionSet::UpdateFilesBySize() -> Version::UpdateFilesBySize()
      
      The diff depends on D15189, D15183 and D15171
      
      Test Plan: make check
      
      Reviewers: kailiu, sdong, haobo, dhruba
      
      Reviewed By: sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15201
      787f11bb
    • I
      Moving Compaction class to separate header file · 615d1ea2
      Igor Canadi 提交于
      Summary:
      I'm sure we'll all agree that version_set.cc needs simplifying. This diff moves Compaction class to a separate file.
      
      The diff depends on D15171 and D15183
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15189
      615d1ea2
    • I
      Move functions from VersionSet to Version · 2f4eda78
      Igor Canadi 提交于
      Summary:
      There were some functions in VersionSet that had no reason to be there instead of Version. Moving them to Version will make column families implementation easier.
      
      The functions moved are:
      * NumLevelBytes
      * LevelSummary
      * LevelFileSummary
      * MaxNextLevelOverlappingBytes
      * AddLiveFiles (previously AddLiveFilesCurrentVersion())
      * NeedSlowdownForNumLevel0Files
      
      The diff continues on (and depends on) D15171
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong, emayanke
      
      Reviewed By: sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15183
      2f4eda78
    • I
      Decrease reliance on VersionSet::NumberLevels() · 65a8a52b
      Igor Canadi 提交于
      Summary:
      With column families VersionSet will not have a constant number of levels (each CF can have different options), so we'll need to eliminate call to VersionSet::NumberLevels()
      
      This diff decreases number of callsites, but we're not there yet. It associates number of levels with Version (each version is associated with single CF) instead of VersionSet.
      
      I have also slightly changed how VersionSet keeps track of manifest size.
      
      This diff also modifies constructor of Compaction such that it takes input_version and automatically Ref()s it. Before this was done outside of constructor.
      
      In next diffs I will continue to decrease number of callsites of VersionSet::NumberLevels() and also references to current_
      
      Test Plan: make check
      
      Reviewers: haobo, dhruba, kailiu, sdong
      
      Reviewed By: sdong
      
      Differential Revision: https://reviews.facebook.net/D15171
      65a8a52b
  18. 15 1月, 2014 3 次提交
    • I
      Fix CompactRange to apply filter to every key · d9cd7a06
      Igor Canadi 提交于
      Summary:
      When doing CompactRange(), we should first flush the memtable and then calculate max_level_with_files. Also, we want to compact all the levels that have files, including level `max_level_with_files`.
      
      This patch fixed the unit test.
      
      Test Plan: Added a failing unit test and a fix, so it's not failing anymore.
      
      Reviewers: dhruba, haobo, sdong
      
      Reviewed By: haobo
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D14421
      d9cd7a06
    • I
      VersionEdit not to take NumLevels() · 055e6df4
      Igor Canadi 提交于
      Summary:
      I will submit a sequence of diffs that are preparing master branch for column families. There are a lot of implicit assumptions in the code that are making column family implementation hard. If I make the change only in column family branch, it will make merging back to master impossible.
      
      Most of the diffs will be simple code refactorings, so I hope we can have fast turnaround time. Feel free to grab me in person to discuss any of them.
      
      This diff removes number of level check from VersionEdit. It is used only when VersionEdit is read, not written, but has to be set when it is written. I believe it is a right thing to make VersionEdit dumb and check consistency on the caller side. This will also make it much easier to implement Column Families, since different column families can have different number of levels.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15159
      055e6df4
    • S
      Pre-calculate whether to slow down for too many level 0 files · fbbf0d14
      Siying Dong 提交于
      Summary: Currently in DBImpl::MakeRoomForWrite(), we do  "versions_->NumLevelFiles(0) >= options_.level0_slowdown_writes_trigger" to check whether the writer thread needs to slow down. However, versions_->NumLevelFiles(0) is slightly more expensive than we expected. By caching the result of the comparison when installing a new version, we can avoid this function call every time.
      
      Test Plan:
      make all check
      Manually trigger this behavior by applying universal compaction style and make sure inserts are made slow after there are certain number of files.
      
      Reviewers: haobo, kailiu, igor
      
      Reviewed By: kailiu
      
      CC: nkg-, leveldb
      
      Differential Revision: https://reviews.facebook.net/D15141
      fbbf0d14
  19. 11 1月, 2014 1 次提交
    • S
      [Performance Branch] If options.max_open_files set to be -1, cache table... · aa0ef660
      Siying Dong 提交于
      [Performance Branch] If options.max_open_files set to be -1, cache table readers in FileMetadata for Get() and NewIterator()
      
      Summary:
      In some use cases, table readers for all live files should always be cached. In that case, there will be an opportunity to avoid the table cache look-up while Get() and NewIterator().
      
      We define options.max_open_files = -1 to be the mode that table readers for live files will always be kept. In that mode, table readers are cached in FileMetaData (with a reference count hold in table cache). So that when executing table_cache.Get() and table_cache.newInterator(), LRU cache checking can be by-passed, to reduce latency.
      
      Test Plan: add a test case in db_test
      
      Reviewers: haobo, kailiu
      
      Reviewed By: haobo
      
      CC: dhruba, igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D15039
      aa0ef660
  20. 03 1月, 2014 1 次提交
  21. 27 12月, 2013 1 次提交
    • S
      Avoid malloc in NotFound key status if no message is given. · 18df47b7
      Siying Dong 提交于
      Summary:
      In some places we have NotFound status created with empty message, but it doesn't avoid a malloc. With this patch, the malloc is avoided for that case.
      
      The motivation of it is that I found in db_bench readrandom test when all keys are not existing, about 4% of the total running time is spent on malloc of Status, plus a similar amount of CPU spent on free of them, which is not necessary.
      
      Test Plan: make all check
      
      Reviewers: dhruba, haobo, igor
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14691
      18df47b7
  22. 18 12月, 2013 1 次提交
  23. 12 12月, 2013 4 次提交
  24. 07 12月, 2013 1 次提交
  25. 26 11月, 2013 2 次提交
  26. 22 11月, 2013 1 次提交