1. 12 3月, 2014 2 次提交
  2. 11 3月, 2014 5 次提交
  3. 08 3月, 2014 1 次提交
  4. 07 3月, 2014 1 次提交
  5. 06 3月, 2014 1 次提交
    • S
      Buffer info logs when picking compactions and write them out after releasing the mutex · ecb1ffa2
      sdong 提交于
      Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex.
      
      Test Plan:
      make all check
      check the log lines while running some tests that trigger compactions.
      
      Reviewers: haobo, igor, dhruba
      
      Reviewed By: dhruba
      
      CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg-
      
      Differential Revision: https://reviews.facebook.net/D16515
      ecb1ffa2
  6. 05 3月, 2014 1 次提交
  7. 01 3月, 2014 2 次提交
    • I
      Make Log::Reader more robust · 58ca641d
      Igor Canadi 提交于
      Summary:
      This diff does two things:
      (1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#
      (2) Turn off mmap writes for all writes to log and manifest files
      
      (2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing.
      
      Test Plan:
      Added unit tests from LevelDB
      Actually recovered a "corrupted" MANIFEST file.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16119
      58ca641d
    • Y
      Add ReadOptions to TransactionLogIterator. · a77527f2
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add an optional input parameter ReadOptions to DB::GetUpdateSince(),
      which allows the verification of checksums to be disabled by setting
      ReadOptions::verify_checksums to false.
      
      Test Plan: Tests are done off-line and will not be included in the regular unit test.
      
      Reviewers: igor
      
      Reviewed By: igor
      
      CC: leveldb, xjin, dhruba
      
      Differential Revision: https://reviews.facebook.net/D16305
      a77527f2
  8. 28 2月, 2014 1 次提交
  9. 26 2月, 2014 1 次提交
    • I
      Schedule flush when waiting on flush · 42095163
      Igor Canadi 提交于
      Summary:
      This will also help with avoiding the deadlock. If a flush failed and we're waiting for a memtable to be flushed, we should schedule a new flush and hope a new one succeedes.
      
      If paranoid_checks = false, Wait() will still hang on ENOSPC, but at least it will automatically continue when the space frees up. Current behavior both hangs and deadlocks.
      
      Also, I renamed some 'compaction' to 'flush'. 'compaction' was leveldb way of saying things.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, ljin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16281
      42095163
  10. 25 2月, 2014 1 次提交
  11. 20 2月, 2014 1 次提交
  12. 14 2月, 2014 1 次提交
  13. 13 2月, 2014 4 次提交
  14. 11 2月, 2014 1 次提交
  15. 04 2月, 2014 2 次提交
  16. 03 2月, 2014 1 次提交
  17. 01 2月, 2014 2 次提交
    • I
      Mark the log_number file number used · dbbffbd7
      Igor Canadi 提交于
      Summary:
      VersionSet::next_file_number_ is always assumed to be strictly greater than VersionSet::log_number_. In our new recovery code, we artificially set log_number_  to be (log_number + 1), so that once we flush, we don't recover from the same log file again (this is important because of merge operator non-idempotence)
      
      When we set VersionSet::log_number_ to (log_number + 1), we also have to mark that file number used, such that next_file_number_ is increased to a legal level. Otherwise, VersionSet might assert.
      
      This has not be a problem so far because here's what happens:
      1. assume next_file_number is 5, we're recovering log_number 10
      2. in DBImpl::Recover() we call MarkFileNumberUsed with 10. This will set VersionSet::next_file_number_ to 11.
      3. If there are some updates, we will call WriteTable0ForRecovery(), which will use file number 11 as a new table file and advance VersionSet::next_file_number_ to 12.
      4. When we LogAndApply() with log_number 11, assertion is true: assert(11 <= 12);
      
      However, this was a lucky occurrence. Even though this diff doesn't cause a bug, I think the issue is important to fix.
      
      Test Plan: In column families I have different recovery logic and this code path asserted. When adding MarkFileNumberUsed(log_number + 1) assert is gone.
      
      Reviewers: dhruba, kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15783
      dbbffbd7
    • S
      When using Universal Compaction, Zero out seqID in the last file too · 56bea9f8
      Siying Dong 提交于
      Summary: I didn't figure out the reason why the feature of zeroing out earlier sequence ID is disabled in universal compaction. I do see bottommost_level is set correctly. It should simply work if we remove the constraint of universal compaction.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba, kailiu, igor
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15423
      56bea9f8
  18. 30 1月, 2014 2 次提交
    • I
      InternalStatistics · 3c0dcf0e
      Igor Canadi 提交于
      Summary:
      In DBImpl we keep track of some statistics internally and expose them via GetProperty(). This diff encapsulates all the internal statistics into a class InternalStatisics. Most of it is copy/paste.
      
      Apart from cleaning up db_impl.cc, this diff is also necessary for Column families, since every column family should have its own CompactionStats, MakeRoomForWrite-stall stats, etc. It's much easier to keep track of it in every column family if it's nicely encapsulated in its own class.
      
      Test Plan: make check
      
      Reviewers: dhruba, kailiu, haobo, sdong, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15273
      3c0dcf0e
    • L
      set bg_error_ when background flush goes wrong · d118707f
      Lei Jin 提交于
      Summary: as title
      
      Test Plan: unit test
      
      Reviewers: haobo, igor, sdong, kailiu, dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15435
      d118707f
  19. 28 1月, 2014 3 次提交
    • M
      Update monitoring to include average time per compaction and stall · 90f29ccb
      Mark Callaghan 提交于
      Summary:
      The new columns are msComp and msStall that provide average time per compaction and stall for that level in milliseconds.
      Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count   msComp   msStall  Ln-stall Stall-cnt
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0        8       15   1.5         2         0        30         0         0        30        0.0       0.0        15.5        0        0        0        0       16      112       0.2       1.3      7568
        1        8       16   1.6         1        26        26        15        11        16        3.5      17.6        18.1        8        6       13        7        3      362       0.0       0.0         0
        2        1        2   0.0         0         0         2         0         0         2        0.0       0.0        18.4        0        0        0        0        1       50       0.0       0.0         0
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15345
      90f29ccb
    • I
      Fsync directory after we create a new file · 832158e7
      Igor Canadi 提交于
      Summary:
      @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder.
      
      Should I also add FsyncDir() to new files that get created by a compaction?
      
      Test Plan: Confirmed that FsyncDir is returning Status::OK()
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D14751
      832158e7
    • I
      Move NeedsCompaction() from VersionSet to Version · 6c2ca1d3
      Igor Canadi 提交于
      Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet.
      
      Test Plan: make check
      
      Reviewers: kailiu, haobo, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15333
      6c2ca1d3
  20. 25 1月, 2014 2 次提交
    • I
      Fixing iterator cleanup for Tailing iterator · f653fdcf
      Igor Canadi 提交于
      Immutable tailing iterator doesn't set CleanupState::mem, so we don't
      have to unref it.
      f653fdcf
    • I
      MemTableListVersion · c583157d
      Igor Canadi 提交于
      Summary:
      MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
      
      This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
      
      I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
      
      Test Plan: `make check` works. I will also do `make valgrind_check` before commit
      
      Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15255
      c583157d
  21. 24 1月, 2014 2 次提交
    • L
      CompactRange() to return status · aba2acb5
      Lei Jin 提交于
      Summary: as title
      
      Test Plan:
      make all check
      What else tests shall I cover?
      
      Reviewers: igor, haobo
      
      CC:
      
      Differential Revision: https://reviews.facebook.net/D15339
      aba2acb5
    • T
      Tailing iterator · 81c9cc9b
      Tomislav Novak 提交于
      Summary:
      This diff implements a special type of iterator that doesn't create a snapshot
      (can be used to read newly inserted data) and is optimized for doing sequential
      reads.
      
      TailingIterator uses current superversion number to determine whether to
      invalidate its internal iterators. If the version hasn't changed, it can often
      avoid doing expensive seeks over immutable structures (sst files and immutable
      memtables).
      
      Test Plan:
      * new unit tests
      * running LD with this patch
      
      Reviewers: igor, dhruba, haobo, sdong, kailiu
      
      Reviewed By: sdong
      
      CC: leveldb, lovro, march
      
      Differential Revision: https://reviews.facebook.net/D15285
      81c9cc9b
  22. 23 1月, 2014 1 次提交
    • I
      Refactor Recover() code · 6fe9b577
      Igor Canadi 提交于
      Summary:
      This diff does two things:
      * Rethinks how we call Recover() with read_only option. Before, we call it with pointer to memtable where we'd like to apply those changes to. This memtable is set in db_impl_readonly.cc and it's actually DBImpl::mem_. Why don't we just apply updates to mem_ right away? It seems more intuitive.
      * Changes when we apply updates to manifest. Before, the process is to recover all the logs, flush it to sst files and then do one giant commit that atomically adds all recovered sst files and sets the next log number. This works good enough, but causes some small troubles for my column family approach, since I can't have one VersionEdit apply to more than single column family[1]. The change here is to commit the files recovered from logs right away. Here is the state of the world before the change:
      1. Recover log 5, add new sst files to edit
      2. Recover log 7, add new sst files to edit
      3. Recover log 8, add new sst files to edit
      4. Commit all added sst files to manifest and mark log files 5, 7 and 8 as recoverd (via SetLogNumber(9) function)
      After the change, we'll do:
      1. Recover log 5, commit the new sst files and set log 5 as recovered
      2. Recover log 7, commit the new sst files and set log 7 as recovered
      3. Recover log 8, commit the new sst files and set log 8 as recovered
      
      The added (small) benefit is that if we fail after (2), the new recovery will only have to recover log 8. In previous case, we'll have to restart the recovery from the beginning. The bigger benefit will be to enable easier integration of multiple column families in Recovery code path.
      
      [1] I'm happy to dicuss this decison, but I believe this is the cleanest way to go. It also makes backward compatibility much easier. We don't have a requirement of adding multiple column families atomically.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15237
      6fe9b577
  23. 18 1月, 2014 2 次提交