1. 06 8月, 2012 1 次提交
    • T
      ext4: make sure the journal sb is written in ext4_clear_journal_err() · d796c52e
      Theodore Ts'o 提交于
      After we transfer set the EXT4_ERROR_FS bit in the file system
      superblock, it's not enough to call jbd2_journal_clear_err() to clear
      the error indication from journal superblock --- we need to call
      jbd2_journal_update_sb_errno() as well.  Otherwise, when the root file
      system is mounted read-only, the journal is replayed, and the error
      indicator is transferred to the superblock --- but the s_errno field
      in the jbd2 superblock is left set (since although we cleared it in
      memory, we never flushed it out to disk).
      
      This can end up confusing e2fsck.  We should make e2fsck more robust
      in this case, but the kernel shouldn't be leaving things in this
      confused state, either.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      
      d796c52e
  2. 27 5月, 2012 4 次提交
  3. 29 3月, 2012 1 次提交
  4. 20 3月, 2012 1 次提交
  5. 14 3月, 2012 5 次提交
    • J
      jbd2: cleanup journal tail after transaction commit · 3339578f
      Jan Kara 提交于
      Normally, we have to issue a cache flush before we can update journal tail in
      journal superblock, effectively wiping out old transactions from the journal.
      So use the fact that during transaction commit we issue cache flush anyway and
      opportunistically push journal tail as far as we can. Since update of journal
      superblock is still costly (we have to use WRITE_FUA), we update log tail only
      if we can free significant amount of space.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3339578f
    • J
      jbd2: issue cache flush after checkpointing even with internal journal · 79feb521
      Jan Kara 提交于
      When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
      checkpointed buffers are on a stable storage - especially if buffers were
      written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
      caches. Thus when we update journal superblock effectively removing old
      transaction from journal, this write of superblock can get to stable storage
      before those checkpointed buffers which can result in filesystem corruption
      after a crash. Thus we must unconditionally issue a cache flush before we
      update journal superblock in these cases.
      
      A similar problem can also occur if journal superblock is written only in
      disk's caches, other transaction starts reusing space of the transaction
      cleaned from the log and power failure happens. Subsequent journal replay would
      still try to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      79feb521
    • N
      PM / Sleep: JBD and JBD2 missing set_freezable() · 35c80422
      Nigel Cunningham 提交于
      With the latest and greatest changes to the freezer, I started seeing
      panics that were caused by jbd2 running post-process freezing and
      hitting the canary BUG_ON for non-TuxOnIce I/O submission. I've traced
      this back to a lack of set_freezable calls in both jbd and jbd2. Since
      they're clearly meant to be frozen (there are tests for freezing()), I
      submit the following patch to add the missing calls.
      Signed-off-by: NNigel Cunningham <nigel@tuxonice.net>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      35c80422
    • J
      jbd2: protect all log tail updates with j_checkpoint_mutex · a78bb11d
      Jan Kara 提交于
      There are some log tail updates that are not protected by j_checkpoint_mutex.
      Some of these are harmless because they happen during startup or shutdown but
      updates in jbd2_journal_commit_transaction() and jbd2_journal_flush() can
      really race with other log tail updates (e.g. someone doing
      jbd2_journal_flush() with someone running jbd2_cleanup_journal_tail()). So
      protect all log tail updates with j_checkpoint_mutex.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a78bb11d
    • J
      jbd2: split updating of journal superblock and marking journal empty · 24bcc89c
      Jan Kara 提交于
      There are three case of updating journal superblock. In the first case, we want
      to mark journal as empty (setting s_sequence to 0), in the second case we want
      to update log tail, in the third case we want to update s_errno. Split these
      cases into separate functions. It makes the code slightly more straightforward
      and later patches will make the distinction even more important.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      24bcc89c
  6. 21 2月, 2012 4 次提交
  7. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  8. 02 11月, 2011 2 次提交
    • E
      jbd2: Unify log messages in jbd2 code · f2a44523
      Eryu Guan 提交于
      Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the
      same time other jbd2 code prints with "JBD: " prefix. Unify the prefix
      to "JBD2: ".
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f2a44523
    • E
      jbd/jbd2: validate sb->s_first in journal_get_superblock() · 8762202d
      Eryu Guan 提交于
      I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
      mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
      image has s_first = 0 in journal superblock, and the 0 is passed to
      journal->j_head in journal_reset(), then to blocknr in
      cleanup_journal_tail(), in the end the J_ASSERT failed.
      
      So validate s_first after reading journal superblock from disk in
      journal_get_superblock() to ensure s_first is valid.
      
      The following script could reproduce it:
      
      fstype=ext3
      blocksize=1024
      img=$fstype.img
      offset=0
      found=0
      magic="c0 3b 39 98"
      
      dd if=/dev/zero of=$img bs=1M count=8
      mkfs -t $fstype -b $blocksize -F $img
      filesize=`stat -c %s $img`
      while [ $offset -lt $filesize ]
      do
              if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
                      echo "Found journal: $offset"
                      found=1
                      break
              fi
              offset=`echo "$offset+$blocksize" | bc`
      done
      
      if [ $found -ne 1 ];then
              echo "Magic \"$magic\" not found"
              exit 1
      fi
      
      dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1
      
      mkdir -p ./mnt
      mount -o loop $img ./mnt
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8762202d
  9. 11 7月, 2011 1 次提交
  10. 14 6月, 2011 1 次提交
    • J
      jbd2: Fix oops in jbd2_journal_remove_journal_head() · de1b7941
      Jan Kara 提交于
      jbd2_journal_remove_journal_head() can oops when trying to access
      journal_head returned by bh2jh(). This is caused for example by the
      following race:
      
      	TASK1					TASK2
        jbd2_journal_commit_transaction()
          ...
          processing t_forget list
            __jbd2_journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					jbd2_journal_try_to_free_buffers()
      					  jbd2_journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  jbd2_journal_put_journal_head(jh)
              jbd2_journal_remove_journal_head(bh);
      
      jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
      buffer is not part of any transaction and thus frees journal_head
      before TASK1 gets to doing so. Note that even buffer_head can be
      released by try_to_free_buffers() after
      jbd2_journal_put_journal_head() which adds even larger opportunity for
      oops (but I didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head
      reference (in b_jcount). That way we don't have to remove journal_head
      explicitely via jbd2_journal_remove_journal_head() and instead just
      remove journal_head when b_jcount drops to zero. The result of this is
      that [__]jbd2_journal_refile_buffer(),
      [__]jbd2_journal_unfile_buffer(), and
      __jdb2_journal_remove_checkpoint() can free journal_head which needs
      modification of a few callers. Also we have to be careful because once
      journal_head is removed, buffer_head might be freed as well. So we
      have to get our own buffer_head reference where it matters.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      de1b7941
  11. 24 5月, 2011 1 次提交
  12. 09 5月, 2011 1 次提交
  13. 02 5月, 2011 1 次提交
    • T
      jbd2: fix fsync() tid wraparound bug · deeeaf13
      Theodore Ts'o 提交于
      If an application program does not make any changes to the indirect
      blocks or extent tree, i_datasync_tid will not get updated.  If there
      are enough commits (i.e., 2**31) such that tid_geq()'s calculations
      wrap, and there isn't a currently active transaction at the time of
      the fdatasync() call, this can end up triggering a BUG_ON in
      fs/jbd2/commit.c:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      It's pretty rare that this can happen, since it requires the use of
      fdatasync() plus *very* frequent and excessive use of fsync().  But
      with the right workload, it can.
      
      We fix this by replacing the use of tid_geq() with an equality test,
      since there's only one valid transaction id that we is valid for us to
      wait until it is commited: namely, the currently running transaction
      (if it exists).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      deeeaf13
  14. 05 4月, 2011 1 次提交
  15. 31 3月, 2011 1 次提交
  16. 01 3月, 2011 1 次提交
  17. 12 2月, 2011 1 次提交
    • T
      jbd2: call __jbd2_log_start_commit with j_state_lock write locked · e4471831
      Theodore Ts'o 提交于
      On an SMP ARM system running ext4, I've received a report that the
      first J_ASSERT in jbd2_journal_commit_transaction has been triggering:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      While investigating possible causes for this problem, I noticed that
      __jbd2_log_start_commit() is getting called with j_state_lock only
      read-locked, in spite of the fact that it's possible for it might
      j_commit_request.  Fix this by grabbing the necessary information so
      we can test to see if we need to start a new transaction before
      dropping the read lock, and then calling jbd2_log_start_commit() which
      will grab the write lock.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e4471831
  18. 11 1月, 2011 1 次提交
  19. 19 12月, 2010 1 次提交
  20. 17 12月, 2010 1 次提交
  21. 18 11月, 2010 1 次提交
  22. 30 10月, 2010 1 次提交
  23. 28 10月, 2010 2 次提交
    • B
      jbd2: Fix I/O hang in jbd2_journal_release_jbd_inode · 39e3ac25
      Brian King 提交于
      This fixes a hang seen in jbd2_journal_release_jbd_inode
      on a lot of Power 6 systems running with ext4. When we get
      in the hung state, all I/O to the disk in question gets blocked
      where we stay indefinitely. Looking at the task list, I can see
      we are stuck in jbd2_journal_release_jbd_inode waiting on a
      wake up. I added some debug code to detect this scenario and
      dump additional data if we were stuck in jbd2_journal_release_jbd_inode
      for longer than 30 minutes. When it hit, I was able to see that
      i_flags was 0, suggesting we missed the wake up.
      
      This patch changes i_flags to be an unsigned long, uses bit operators
      to access it, and adds barriers around the accesses. Prior to applying
      this patch, we were regularly hitting this hang on numerous systems
      in our test environment. After applying the patch, the hangs no longer
      occur.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      39e3ac25
    • A
      jbd/2: fixed typos · bcf3d0bc
      Andrea Gelmini 提交于
      "wakup"
      Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      bcf3d0bc
  24. 10 9月, 2010 1 次提交
    • P
      JBD2: Allow feature checks before journal recovery · 1113e1b5
      Patrick J. LoPresti 提交于
      Before we start accessing a huge (> 16 TiB) OCFS2 volume, we need to
      confirm that its journal supports 64-bit offsets.  In particular, we
      need to check the journal's feature bits before recovering the journal.
      
      This is not possible with JBD2 at present, because the journal
      superblock (where the feature bits reside) is not loaded from disk until
      the journal is recovered.
      
      This patch loads the journal superblock in
      jbd2_journal_check_used_features() if it has not already been loaded,
      allowing us to check the feature bits before journal recovery.
      Signed-off-by: NPatrick LoPresti <lopresti@gmail.com>
      Cc: linux-ext4@vger.kernel.org
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      1113e1b5
  25. 18 8月, 2010 1 次提交
    • C
      remove SWRITE* I/O types · 9cb569d6
      Christoph Hellwig 提交于
      These flags aren't real I/O types, but tell ll_rw_block to always
      lock the buffer instead of giving up on a failed trylock.
      
      Instead add a new write_dirty_buffer helper that implements this semantic
      and use it from the existing SWRITE* callers.  Note that the ll_rw_block
      code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
      this patch fixes.
      
      In the ufs code clean up the helper that used to call ll_rw_block
      to mirror sync_dirty_buffer, which is the function it implements for
      compound buffers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9cb569d6
  26. 04 8月, 2010 1 次提交
  27. 27 7月, 2010 1 次提交
    • T
      jbd2: Remove __GFP_NOFAIL from jbd2 layer · 47def826
      Theodore Ts'o 提交于
      __GFP_NOFAIL is going away, so add our own retry loop.  Also add
      jbd2__journal_start() and jbd2__journal_restart() which take a gfp
      mask, so that file systems can optionally (re)start transaction
      handles using GFP_KERNEL.  If they do this, then they need to be
      prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
      to reflect that error up to userspace.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      47def826
  28. 16 7月, 2010 1 次提交
    • J
      jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09
      Jan Kara 提交于
      OCFS2 uses t_commit trigger to compute and store checksum of the just
      committed blocks. When a buffer has b_frozen_data, checksum is computed
      for it instead of b_data but this can result in an old checksum being
      written to the filesystem in the following scenario:
      
      1) transaction1 is opened
      2) handle1 is opened
      3) journal_access(handle1, bh)
          - This sets jh->b_transaction to transaction1
      4) modify(bh)
      5) journal_dirty(handle1, bh)
      6) handle1 is closed
      7) start committing transaction1, opening transaction2
      8) handle2 is opened
      9) journal_access(handle2, bh)
          - This copies off b_frozen_data to make it safe for transaction1 to commit.
            jh->b_next_transaction is set to transaction2.
      10) jbd2_journal_write_metadata() checksums b_frozen_data
      11) the journal correctly writes b_frozen_data to the disk journal
      12) handle2 is closed
          - There was no dirty call for the bh on handle2, so it is never queued for
            any more journal operation
      13) Checkpointing finally happens, and it just spools the bh via normal buffer
      writeback.  This will write b_data, which was never triggered on and thus
      contains a wrong (old) checksum.
      
      This patch fixes the problem by calling the trigger at the moment data is
      frozen for journal commit - i.e., either when b_frozen_data is created by
      do_get_write_access or just before we write a buffer to the log if
      b_frozen_data does not exist. We also rename the trigger to t_frozen as
      that better describes when it is called.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      13ceef09