1. 23 3月, 2013 2 次提交
  2. 15 3月, 2013 3 次提交
  3. 08 3月, 2013 6 次提交
  4. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  5. 26 2月, 2013 1 次提交
  6. 23 2月, 2013 1 次提交
  7. 15 2月, 2013 4 次提交
    • D
      xfs: xfs_bmap_add_attrfork_local is too generic · 1e82379b
      Dave Chinner 提交于
      When we are converting local data to an extent format as a result of
      adding an attribute, the type of data contained in the local fork
      determines the behaviour that needs to occur.
      
      xfs_bmap_add_attrfork_local() already handles the directory data
      case specially by using S_ISDIR() and calling out to
      xfs_dir2_sf_to_block(), but with verifiers we now need to handle
      each different type of metadata specially and different metadata
      formats require different verifiers (and eventually block header
      initialisation).
      
      There is only a single place that we add and attribute fork to
      the inode, but that is in the attribute code and it knows nothing
      about the specific contents of the data fork. It is only the case of
      local data that is the issue here, so adding code to hadnle this
      case in the attribute specific code is wrong. Hence we are really
      stuck trying to detect the data fork contents in
      xfs_bmap_add_attrfork_local() and performing the correct callout
      there.
      
      Luckily the current cases can be determined by S_IS* macros, and we
      can push the work off to data specific callouts, but each of those
      callouts does a lot of work in common with
      xfs_bmap_local_to_extents(). The only reason that this fails for
      symlinks right now is is that xfs_bmap_local_to_extents() assumes
      the data fork contains extent data, and so attaches a a bmap extent
      data verifier to the buffer and simply copies the data fork
      information straight into it.
      
      To fix this, allow us to pass a "formatting" callback into
      xfs_bmap_local_to_extents() which is responsible for setting the
      buffer type, initialising it and copying the data fork contents over
      to the new buffer. This allows callers to specify how they want to
      format the new buffer (which is necessary for the upcoming CRC
      enabled metadata blocks) and hence make xfs_bmap_local_to_extents()
      useful for any type of data fork content.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1e82379b
    • B
      xfs: remove log force from xfs_buf_trylock() · fa5566e4
      Brian Foster 提交于
      The trylock log force invoked via xfs_buf_item_push() can attempt
      to acquire xa_lock, thus leading to a recursion bug when called
      with xa_lock held.
      
      This log force was originally added to xfs_buf_trylock() to address
      xfsaild stalls due to pinned and stale buffers. Since the addition
      of this behavior, the log item pushing code had been reworked to
      detect and track pinned items to inform xfsaild to issue a log
      force itself when necessary. As such, the log force on trylock
      failure is redundant and safe to remove.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fa5566e4
    • B
      xfs: recheck buffer pinned status after push trylock failure · 5337fe9b
      Brian Foster 提交于
      The buffer pinned check and trylock sequence in xfs_buf_item_push()
      can race with an active transaction on marking the buffer pinned.
      This can result in the buffer becoming pinned and stale after the
      initial check and the trylock failure, but before the check in
      xfs_buf_trylock() that issues a log force. If the log force is
      issued from this context, a spinlock recursion occurs on xa_lock.
      
      Prepare xfs_buf_item_push() to handle the race by detecting a
      pinned buffer after the trylock failure so xfsaild issues a log
      force from a safe context. This, along with various previous fixes,
      renders the log force in xfs_buf_trylock() redundant.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5337fe9b
    • D
      xfs: limit speculative prealloc size on sparse files · a1e16c26
      Dave Chinner 提交于
      Speculative preallocation based on the current file size works well
      for contiguous files, but is sub-optimal for sparse files where the
      EOF preallocation can fill holes and result in large amounts of
      zeros being written when it is not necessary.
      
      The algorithm is modified to prevent EOF speculative preallocation
      from triggering larger allocations on IO patterns of
      truncate--to-zero-seek-write-seek-write-....  which results in
      non-sparse files for large files. This, unfortunately, is the way cp
      now behaves when copying sparse files and so needs to be fixed.
      
      What this code does is that it looks at the existing extent adjacent
      to the current EOF and if it determines that it is a hole we disable
      speculative preallocation altogether. To avoid the next write from
      doing a large prealloc, it takes the size of subsequent
      preallocations from the current size of the existing EOF extent.
      IOWs, if you leave a hole in the file, it resets preallocation
      behaviour to the same as if it was a zero size file.
      
      Example new behaviour:
      
      $ xfs_io -f -c "pwrite 0 31m" \
                  -c "pwrite 33m 1m" \
                  -c "pwrite 128m 1m" \
                  -c "fiemap -v" /mnt/scratch/blah
      wrote 32505856/32505856 bytes at offset 0
      31 MiB, 7936 ops; 0.0000 sec (1.608 GiB/sec and 421432.7439 ops/sec)
      wrote 1048576/1048576 bytes at offset 34603008
      1 MiB, 256 ops; 0.0000 sec (1.462 GiB/sec and 383233.5329 ops/sec)
      wrote 1048576/1048576 bytes at offset 134217728
      1 MiB, 256 ops; 0.0000 sec (1.719 GiB/sec and 450704.2254 ops/sec)
      /mnt/scratch/blah:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..65535]:      96..65631        65536   0x0
         1: [65536..67583]:  hole              2048
         2: [67584..69631]:  67680..69727      2048   0x0
         3: [69632..262143]: hole             192512
         4: [262144..264191]: 262240..264287    2048   0x1
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a1e16c26
  8. 07 2月, 2013 1 次提交
  9. 02 2月, 2013 13 次提交
  10. 29 1月, 2013 8 次提交
    • T
      xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages() · 65e3aa77
      Torsten Kaiser 提交于
      Commit fb595814 removed
      xfs_flushinval_pages() and changed its callers to use
      filemap_write_and_wait() and  truncate_pagecache_range() directly.
      
      But in xfs_swap_extents() this change accidental switched the argument
      for 'tip' to 'ip'. This patch switches it back to 'tip'
      Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      65e3aa77
    • T
      xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages() · 2729423c
      Torsten Kaiser 提交于
      Commit fb595814 removed
      xfs_flushinval_pages() and changed its callers to use
      filemap_write_and_wait() and  truncate_pagecache_range() directly.
      
      But in xfs_swap_extents() this change accidental switched the argument
      for 'tip' to 'ip'. This patch switches it back to 'tip'
      Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      2729423c
    • J
      xfs: Fix possible use-after-free with AIO · 4b05d09c
      Jan Kara 提交于
      Running AIO is pinning inode in memory using file reference. Once AIO
      is completed using aio_complete(), file reference is put and inode can
      be freed from memory. So we have to be sure that calling aio_complete()
      is the last thing we do with the inode.
      
      CC: xfs@oss.sgi.com
      CC: Ben Myers <bpm@sgi.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4b05d09c
    • D
      xfs: fix shutdown hang on invalid inode during create · 9f87832a
      Dave Chinner 提交于
      When the new inode verify in xfs_iread() fails, the create
      transaction is aborted and a shutdown occurs. The subsequent unmount
      then hangs in xfs_wait_buftarg() on a buffer that has an elevated
      hold count. Debug showed that it was an AGI buffer getting stuck:
      
      [   22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      
      The trace of this buffer leading up to the shutdown (trimmed for
      brevity) looks like:
      
      xfs_buf_init:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
      xfs_buf_iorequest:   bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
      xfs_buf_iowait:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_ioerror:     bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
      xfs_buf_iodone:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
      xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_trans_brelse:    bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_buf_item_relse:  bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_trylock:     bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
      xfs_buf_find:        bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_trans_log_buf:   bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      
      And that is the AGI buffer from cold cache read into memory to
      transaction abort. You can see at transaction abort the bli is dirty
      and only has a single reference. The item is not pinned, and it's
      not in the AIL. Hence the only reference to it is this transaction.
      
      The problem is that the xfs_buf_item_unlock() call is dropping the
      last reference to the xfs_buf_log_item attached to the buffer (which
      holds a reference to the buffer), but it is not freeing the
      xfs_buf_log_item. Hence nothing will ever release the buffer, and
      the unmount hangs waiting for this reference to go away.
      
      The fix is simple - xfs_buf_item_unlock needs to detect the last
      reference going away in this case and free the xfs_buf_log_item to
      release the reference it holds on the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9f87832a
    • D
      xfs: limit speculative prealloc near ENOSPC thresholds · f2a45956
      Dave Chinner 提交于
      There is a window on small filesytsems where specualtive
      preallocation can be larger than that ENOSPC throttling thresholds,
      resulting in specualtive preallocation trying to reserve more space
      than there is space available. This causes immediate ENOSPC to be
      triggered, prealloc to be turned off and flushing to occur. One the
      next write (i.e. next 4k page), we do exactly the same thing, and so
      effective drive into synchronous 4k writes by triggering ENOSPC
      flushing on every page while in the window between the prealloc size
      and the ENOSPC prealloc throttle threshold.
      
      Fix this by checking to see if the prealloc size would consume all
      free space, and throttle it appropriately to avoid premature
      ENOSPC...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f2a45956
    • D
      xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end · eb178619
      Dave Chinner 提交于
      When _xfs_buf_find is passed an out of range address, it will fail
      to find a relevant struct xfs_perag and oops with a null
      dereference. This can happen when trying to walk a filesystem with a
      metadata inode that has a partially corrupted extent map (i.e. the
      block number returned is corrupt, but is otherwise intact) and we
      try to read from the corrupted block address.
      
      In this case, just fail the lookup. If it is readahead being issued,
      it will simply not be done, but if it is real read that fails we
      will get an error being reported.  Ideally this case should result
      in an EFSCORRUPTED error being reported, but we cannot return an
      error through xfs_buf_read() or xfs_buf_get() so this lookup failure
      may result in ENOMEM or EIO errors being reported instead.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      eb178619
    • B
      xfs: pull up stack_switch check into xfs_bmapi_write · d26978dd
      Brian Foster 提交于
      The stack_switch check currently occurs in __xfs_bmapi_allocate,
      which means the stack switch only occurs when xfs_bmapi_allocate()
      is called in a loop. Pull the check up before the loop in
      xfs_bmapi_write() such that the first iteration of the loop has
      consistent behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      d26978dd
    • E
      xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magic · 1bee12b8
      Eric Sandeen 提交于
      98021821 changed the return value from EWRONGFS (aka EINVAL)
      to EFSCORRUPTED which doesn't seem to be handled properly by
      the root filesystem probe.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Tested-by: NSergei Trofimovich <slyfox@gentoo.org>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1bee12b8