1. 31 7月, 2012 2 次提交
  2. 13 7月, 2012 1 次提交
    • J
      block: fix infinite loop in __getblk_slow · 91f68c89
      Jeff Moyer 提交于
      Commit 080399aa ("block: don't mark buffers beyond end of disk as
      mapped") exposed a bug in __getblk_slow that causes mount to hang as it
      loops infinitely waiting for a buffer that lies beyond the end of the
      disk to become uptodate.
      
      The problem was initially reported by Torsten Hilbrich here:
      
          https://lkml.org/lkml/2012/6/18/54
      
      and also reported independently here:
      
          http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511
      
      and then Richard W.M.  Jones and Marcos Mello noted a few separate
      bugzillas also associated with the same issue.  This patch has been
      confirmed to fix:
      
          https://bugzilla.redhat.com/show_bug.cgi?id=835019
      
      The main problem is here, in __getblk_slow:
      
              for (;;) {
                      struct buffer_head * bh;
                      int ret;
      
                      bh = __find_get_block(bdev, block, size);
                      if (bh)
                              return bh;
      
                      ret = grow_buffers(bdev, block, size);
                      if (ret < 0)
                              return NULL;
                      if (ret == 0)
                              free_more_memory();
              }
      
      __find_get_block does not find the block, since it will not be marked as
      mapped, and so grow_buffers is called to fill in the buffers for the
      associated page.  I believe the for (;;) loop is there primarily to
      retry in the case of memory pressure keeping grow_buffers from
      succeeding.  However, we also continue to loop for other cases, like the
      block lying beond the end of the disk.  So, the fix I came up with is to
      only loop when grow_buffers fails due to memory allocation issues
      (return value of 0).
      
      The attached patch was tested by myself, Torsten, and Rich, and was
      found to resolve the problem in call cases.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-and-Tested-by: NTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: NRichard W.M. Jones <rjones@redhat.com>
      Reviewed-by: NJosh Boyer <jwboyer@redhat.com>
      Cc: Stable <stable@vger.kernel.org>  # 3.0+
      [ Jens is on vacation, taking this directly  - Linus ]
      --
      Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91f68c89
  3. 31 5月, 2012 1 次提交
  4. 11 5月, 2012 1 次提交
    • J
      block: don't mark buffers beyond end of disk as mapped · 080399aa
      Jeff Moyer 提交于
      Hi,
      
      We have a bug report open where a squashfs image mounted on ppc64 would
      exhibit errors due to trying to read beyond the end of the disk.  It can
      easily be reproduced by doing the following:
      
      [root@ibm-p750e-02-lp3 ~]# ls -l install.img
      -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
      [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
      [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
      dd: reading `/dev/loop0': Input/output error
      277376+0 records in
      277376+0 records out
      142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s
      
      In dmesg, you'll find the following:
      
      squashfs: version 4.0 (2009/01/31) Phillip Lougher
      [   43.106012] attempt to access beyond end of device
      [   43.106029] loop0: rw=0, want=277410, limit=277408
      [   43.106039] Buffer I/O error on device loop0, logical block 138704
      [   43.106053] attempt to access beyond end of device
      [   43.106057] loop0: rw=0, want=277412, limit=277408
      [   43.106061] Buffer I/O error on device loop0, logical block 138705
      [   43.106066] attempt to access beyond end of device
      [   43.106070] loop0: rw=0, want=277414, limit=277408
      [   43.106073] Buffer I/O error on device loop0, logical block 138706
      [   43.106078] attempt to access beyond end of device
      [   43.106081] loop0: rw=0, want=277416, limit=277408
      [   43.106085] Buffer I/O error on device loop0, logical block 138707
      [   43.106089] attempt to access beyond end of device
      [   43.106093] loop0: rw=0, want=277418, limit=277408
      [   43.106096] Buffer I/O error on device loop0, logical block 138708
      [   43.106101] attempt to access beyond end of device
      [   43.106104] loop0: rw=0, want=277420, limit=277408
      [   43.106108] Buffer I/O error on device loop0, logical block 138709
      [   43.106112] attempt to access beyond end of device
      [   43.106116] loop0: rw=0, want=277422, limit=277408
      [   43.106120] Buffer I/O error on device loop0, logical block 138710
      [   43.106124] attempt to access beyond end of device
      [   43.106128] loop0: rw=0, want=277424, limit=277408
      [   43.106131] Buffer I/O error on device loop0, logical block 138711
      [   43.106135] attempt to access beyond end of device
      [   43.106139] loop0: rw=0, want=277426, limit=277408
      [   43.106143] Buffer I/O error on device loop0, logical block 138712
      [   43.106147] attempt to access beyond end of device
      [   43.106151] loop0: rw=0, want=277428, limit=277408
      [   43.106154] Buffer I/O error on device loop0, logical block 138713
      [   43.106158] attempt to access beyond end of device
      [   43.106162] loop0: rw=0, want=277430, limit=277408
      [   43.106166] attempt to access beyond end of device
      [   43.106169] loop0: rw=0, want=277432, limit=277408
      ...
      [   43.106307] attempt to access beyond end of device
      [   43.106311] loop0: rw=0, want=277470, limit=2774
      
      Squashfs manages to read in the end block(s) of the disk during the
      mount operation.  Then, when dd reads the block device, it leads to
      block_read_full_page being called with buffers that are beyond end of
      disk, but are marked as mapped.  Thus, it would end up submitting read
      I/O against them, resulting in the errors mentioned above.  I fixed the
      problem by modifying init_page_buffers to only set the buffer mapped if
      it fell inside of i_size.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NNick Piggin <npiggin@kernel.dk>
      
      --
      
      Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      080399aa
  5. 26 4月, 2012 1 次提交
    • G
      fs/buffer.c: remove BUG() in possible but rare condition · 61065a30
      Glauber Costa 提交于
      While stressing the kernel with with failing allocations today, I hit the
      following chain of events:
      
      alloc_page_buffers():
      
      	bh = alloc_buffer_head(GFP_NOFS);
      	if (!bh)
      		goto no_grow; <= path taken
      
      grow_dev_page():
              bh = alloc_page_buffers(page, size, 0);
              if (!bh)
                      goto failed;  <= taken, consequence of the above
      
      and then the failed path BUG()s the kernel.
      
      The failure is inserted a litte bit artificially, but even then, I see no
      reason why it should be deemed impossible in a real box.
      
      Even though this is not a condition that we expect to see around every
      time, failed allocations are expected to be handled, and BUG() sounds just
      too much.  As a matter of fact, grow_dev_page() can return NULL just fine
      in other circumstances, so I propose we just remove it, then.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61065a30
  6. 29 3月, 2012 1 次提交
  7. 29 2月, 2012 1 次提交
  8. 04 1月, 2012 1 次提交
  9. 01 11月, 2011 1 次提交
  10. 31 10月, 2011 1 次提交
    • C
      writeback: Add a 'reason' to wb_writeback_work · 0e175a18
      Curt Wohlgemuth 提交于
      This creates a new 'reason' field in a wb_writeback_work
      structure, which unambiguously identifies who initiates
      writeback activity.  A 'wb_reason' enumeration has been
      added to writeback.h, to enumerate the possible reasons.
      
      The 'writeback_work_class' and tracepoint event class and
      'writeback_queue_io' tracepoints are updated to include the
      symbolic 'reason' in all trace events.
      
      And the 'writeback_inodes_sbXXX' family of routines has had
      a wb_stats parameter added to them, so callers can specify
      why writeback is being started.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      0e175a18
  11. 28 10月, 2011 1 次提交
  12. 16 6月, 2011 1 次提交
    • J
      vfs: Fix data corruption after failed write in __block_write_begin() · f9f07b6c
      Jan Kara 提交于
      I've got a report of a file corruption from fsxlinux on ext3. The important
      operations to the page were:
      mapwrite to a hole
      partial write to the page
      read - found the page zeroed from the end of the normal write
      
      The culprit seems to be that if get_block() fails in __block_write_begin()
      (e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
      Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
      of the page is needed and overwrites old data.  In fact, I don't see why we
      should ever need to zero the uptodate bit here - either the page was uptodate
      when we entered __block_write_begin() and it should stay so when we leave it,
      or it was not uptodate and noone had right to set it uptodate during
      __block_write_begin() so it remains !uptodate when we leave as well. So just
      remove clearing of the bit.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f9f07b6c
  13. 28 5月, 2011 1 次提交
  14. 27 5月, 2011 1 次提交
    • D
      mm/fs: add hooks to support cleancache · c515e1fd
      Dan Magenheimer 提交于
      This fourth patch of eight in this cleancache series provides the
      core hooks in VFS for: initializing cleancache per filesystem;
      capturing clean pages reclaimed by page cache; attempting to get
      pages from cleancache before filesystem read; and ensuring coherency
      between pagecache, disk, and cleancache.  Note that the placement
      of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
      change was required due to a patchset in 2.6.39.
      
      All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
      a check of a boolean global if CONFIG_CLEANCACHE is set but no
      cleancache "backend" has claimed cleancache_ops.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      c515e1fd
  15. 26 5月, 2011 2 次提交
    • J
      vfs: Block mmapped writes while the fs is frozen · ea13a864
      Jan Kara 提交于
      We should not allow file modification via mmap while the filesystem is
      frozen. So block in block_page_mkwrite() while the filesystem is frozen.
      We cannot do the blocking wait in __block_page_mkwrite() since e.g. ext4
      will want to call that function with transaction started in some cases
      and that would deadlock. But we can at least do the non-blocking reliable
      check in __block_page_mkwrite() which is the hardest part anyway.
      
      We have to check for frozen filesystem with the page marked dirty and under
      page lock with which we then return from ->page_mkwrite(). Only that way we
      cannot race with writeback done by freezing code - either we mark the page
      dirty after the writeback has started, see freezing in progress and block, or
      writeback will wait for our page lock which is released only when the fault is
      done and then writeback will writeout and writeprotect the page again.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ea13a864
    • J
      vfs: Create __block_page_mkwrite() helper passing error values back · 24da4fab
      Jan Kara 提交于
      Create __block_page_mkwrite() helper which does all what block_page_mkwrite()
      does except that it passes back errors from __block_write_begin /
      block_commit_write calls.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      24da4fab
  16. 25 3月, 2011 1 次提交
    • D
      fs: protect inode->i_state with inode->i_lock · 250df6ed
      Dave Chinner 提交于
      Protect inode state transitions and validity checks with the
      inode->i_lock. This enables us to make inode state transitions
      independently of the inode_lock and is the first step to peeling
      away the inode_lock from the code.
      
      This requires that __iget() is done atomically with i_state checks
      during list traversals so that we don't race with another thread
      marking the inode I_FREEING between the state check and grabbing the
      reference.
      
      Also remove the unlock_new_inode() memory barrier optimisation
      required to avoid taking the inode_lock when clearing I_NEW.
      Simplify the code by simply taking the inode->i_lock around the
      state change and wakeup. Because the wakeup is no longer tricky,
      remove the wake_up_inode() function and open code the wakeup where
      necessary.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      250df6ed
  17. 17 3月, 2011 1 次提交
  18. 10 3月, 2011 2 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
    • J
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe 提交于
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7eaceacc
  19. 17 12月, 2010 2 次提交
  20. 27 10月, 2010 2 次提交
  21. 26 10月, 2010 3 次提交
  22. 10 9月, 2010 1 次提交
  23. 18 8月, 2010 2 次提交
    • C
      remove SWRITE* I/O types · 9cb569d6
      Christoph Hellwig 提交于
      These flags aren't real I/O types, but tell ll_rw_block to always
      lock the buffer instead of giving up on a failed trylock.
      
      Instead add a new write_dirty_buffer helper that implements this semantic
      and use it from the existing SWRITE* callers.  Note that the ll_rw_block
      code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
      this patch fixes.
      
      In the ufs code clean up the helper that used to call ll_rw_block
      to mirror sync_dirty_buffer, which is the function it implements for
      compound buffers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9cb569d6
    • C
      kill BH_Ordered flag · 87e99511
      Christoph Hellwig 提交于
      Instead of abusing a buffer_head flag just add a variant of
      sync_dirty_buffer which allows passing the exact type of write
      flag required.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      87e99511
  24. 10 8月, 2010 4 次提交
  25. 28 5月, 2010 1 次提交
    • N
      fs: introduce new truncate sequence · 7bb46a67
      npiggin@suse.de 提交于
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
      setattr > vmtruncate > truncate, have filesystems call their truncate sequence
      from ->setattr if filesystem specific operations are required. vmtruncate is
      deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
      previously should be used.
      
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence. Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      away.
      
      simple_setsize is also introduced to perform just the ATTR_SIZE portion
      of simple_setattr (ie. changing i_size and trimming pagecache).
      
      To implement the new truncate sequence:
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - convert usage of helpers block_write_begin, nobh_write_begin,
        cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
        variants. These avoid calling vmtruncate to trim blocks (see previous).
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      - make use of the better opportunity to handle errors with the new sequence.
      
      Big problem with the previous calling sequence: the filesystem is not called
      until i_size has already changed.  This means it is not allowed to fail the
      call, and also it does not know what the previous i_size was. Also, generic
      code calling vmtruncate to truncate allocated blocks in case of error had
      no good way to return a meaningful error (or, for example, atomically handle
      block deallocation).
      
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7bb46a67
  26. 22 5月, 2010 4 次提交