1. 15 2月, 2013 3 次提交
    • B
      xfs: remove log force from xfs_buf_trylock() · fa5566e4
      Brian Foster 提交于
      The trylock log force invoked via xfs_buf_item_push() can attempt
      to acquire xa_lock, thus leading to a recursion bug when called
      with xa_lock held.
      
      This log force was originally added to xfs_buf_trylock() to address
      xfsaild stalls due to pinned and stale buffers. Since the addition
      of this behavior, the log item pushing code had been reworked to
      detect and track pinned items to inform xfsaild to issue a log
      force itself when necessary. As such, the log force on trylock
      failure is redundant and safe to remove.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fa5566e4
    • B
      xfs: recheck buffer pinned status after push trylock failure · 5337fe9b
      Brian Foster 提交于
      The buffer pinned check and trylock sequence in xfs_buf_item_push()
      can race with an active transaction on marking the buffer pinned.
      This can result in the buffer becoming pinned and stale after the
      initial check and the trylock failure, but before the check in
      xfs_buf_trylock() that issues a log force. If the log force is
      issued from this context, a spinlock recursion occurs on xa_lock.
      
      Prepare xfs_buf_item_push() to handle the race by detecting a
      pinned buffer after the trylock failure so xfsaild issues a log
      force from a safe context. This, along with various previous fixes,
      renders the log force in xfs_buf_trylock() redundant.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5337fe9b
    • D
      xfs: limit speculative prealloc size on sparse files · a1e16c26
      Dave Chinner 提交于
      Speculative preallocation based on the current file size works well
      for contiguous files, but is sub-optimal for sparse files where the
      EOF preallocation can fill holes and result in large amounts of
      zeros being written when it is not necessary.
      
      The algorithm is modified to prevent EOF speculative preallocation
      from triggering larger allocations on IO patterns of
      truncate--to-zero-seek-write-seek-write-....  which results in
      non-sparse files for large files. This, unfortunately, is the way cp
      now behaves when copying sparse files and so needs to be fixed.
      
      What this code does is that it looks at the existing extent adjacent
      to the current EOF and if it determines that it is a hole we disable
      speculative preallocation altogether. To avoid the next write from
      doing a large prealloc, it takes the size of subsequent
      preallocations from the current size of the existing EOF extent.
      IOWs, if you leave a hole in the file, it resets preallocation
      behaviour to the same as if it was a zero size file.
      
      Example new behaviour:
      
      $ xfs_io -f -c "pwrite 0 31m" \
                  -c "pwrite 33m 1m" \
                  -c "pwrite 128m 1m" \
                  -c "fiemap -v" /mnt/scratch/blah
      wrote 32505856/32505856 bytes at offset 0
      31 MiB, 7936 ops; 0.0000 sec (1.608 GiB/sec and 421432.7439 ops/sec)
      wrote 1048576/1048576 bytes at offset 34603008
      1 MiB, 256 ops; 0.0000 sec (1.462 GiB/sec and 383233.5329 ops/sec)
      wrote 1048576/1048576 bytes at offset 134217728
      1 MiB, 256 ops; 0.0000 sec (1.719 GiB/sec and 450704.2254 ops/sec)
      /mnt/scratch/blah:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..65535]:      96..65631        65536   0x0
         1: [65536..67583]:  hole              2048
         2: [67584..69631]:  67680..69727      2048   0x0
         3: [69632..262143]: hole             192512
         4: [262144..264191]: 262240..264287    2048   0x1
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a1e16c26
  2. 07 2月, 2013 1 次提交
  3. 02 2月, 2013 13 次提交
  4. 29 1月, 2013 1 次提交
  5. 26 1月, 2013 2 次提交
    • J
      xfs: Fix possible use-after-free with AIO · ced55f38
      Jan Kara 提交于
      Running AIO is pinning inode in memory using file reference. Once AIO
      is completed using aio_complete(), file reference is put and inode can
      be freed from memory. So we have to be sure that calling aio_complete()
      is the last thing we do with the inode.
      
      CC: xfs@oss.sgi.com
      CC: Ben Myers <bpm@sgi.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ced55f38
    • D
      xfs: fix shutdown hang on invalid inode during create · 3b19034d
      Dave Chinner 提交于
      When the new inode verify in xfs_iread() fails, the create
      transaction is aborted and a shutdown occurs. The subsequent unmount
      then hangs in xfs_wait_buftarg() on a buffer that has an elevated
      hold count. Debug showed that it was an AGI buffer getting stuck:
      
      [   22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      [   23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
      
      The trace of this buffer leading up to the shutdown (trimmed for
      brevity) looks like:
      
      xfs_buf_init:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
      xfs_buf_iorequest:   bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
      xfs_buf_iowait:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_ioerror:     bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
      xfs_buf_iodone:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
      xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_trans_brelse:    bno 0x2 len 0x200 hold 2 recur 0 refcount 1
      xfs_buf_item_relse:  bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
      xfs_buf_trylock:     bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
      xfs_buf_find:        bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
      xfs_buf_get:         bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
      xfs_buf_read:        bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
      xfs_buf_hold:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
      xfs_trans_read_buf:  bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_trans_log_buf:   bno 0x2 len 0x200 hold 3 recur 0 refcount 1
      xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
      xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      xfs_buf_rele:        bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
      
      And that is the AGI buffer from cold cache read into memory to
      transaction abort. You can see at transaction abort the bli is dirty
      and only has a single reference. The item is not pinned, and it's
      not in the AIL. Hence the only reference to it is this transaction.
      
      The problem is that the xfs_buf_item_unlock() call is dropping the
      last reference to the xfs_buf_log_item attached to the buffer (which
      holds a reference to the buffer), but it is not freeing the
      xfs_buf_log_item. Hence nothing will ever release the buffer, and
      the unmount hangs waiting for this reference to go away.
      
      The fix is simple - xfs_buf_item_unlock needs to detect the last
      reference going away in this case and free the xfs_buf_log_item to
      release the reference it holds on the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3b19034d
  6. 25 1月, 2013 2 次提交
    • D
      xfs: limit speculative prealloc near ENOSPC thresholds · 4d559a3b
      Dave Chinner 提交于
      There is a window on small filesytsems where specualtive
      preallocation can be larger than that ENOSPC throttling thresholds,
      resulting in specualtive preallocation trying to reserve more space
      than there is space available. This causes immediate ENOSPC to be
      triggered, prealloc to be turned off and flushing to occur. One the
      next write (i.e. next 4k page), we do exactly the same thing, and so
      effective drive into synchronous 4k writes by triggering ENOSPC
      flushing on every page while in the window between the prealloc size
      and the ENOSPC prealloc throttle threshold.
      
      Fix this by checking to see if the prealloc size would consume all
      free space, and throttle it appropriately to avoid premature
      ENOSPC...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4d559a3b
    • D
      xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end · 10616b80
      Dave Chinner 提交于
      When _xfs_buf_find is passed an out of range address, it will fail
      to find a relevant struct xfs_perag and oops with a null
      dereference. This can happen when trying to walk a filesystem with a
      metadata inode that has a partially corrupted extent map (i.e. the
      block number returned is corrupt, but is otherwise intact) and we
      try to read from the corrupted block address.
      
      In this case, just fail the lookup. If it is readahead being issued,
      it will simply not be done, but if it is real read that fails we
      will get an error being reported.  Ideally this case should result
      in an EFSCORRUPTED error being reported, but we cannot return an
      error through xfs_buf_read() or xfs_buf_get() so this lookup failure
      may result in ENOMEM or EIO errors being reported instead.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      10616b80
  7. 19 1月, 2013 1 次提交
  8. 18 1月, 2013 2 次提交
  9. 17 1月, 2013 1 次提交
  10. 14 1月, 2013 2 次提交
    • A
      fs/xfs remove obsolete simple_strto<foo> · a17164e5
      Abhijit Pawar 提交于
      This patch replaces usages of obsolete simple_strtoul with kstrtoint in
      xfs_args and suffix_strtoul.
      Signed-off-by: NAbhijit Pawar <abhi.c.pawar@gmail.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a17164e5
    • E
      xfs: recalculate leaf entry pointer after compacting a dir2 block · d4608632
      Eric Sandeen 提交于
      Dave Jones hit this assert when doing a compile on recent git, with
      CONFIG_XFS_DEBUG enabled:
      
      XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828
      
      Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup)
      contained "2" and not the proper offset, and I found that this value was
      changed after the memmoves under "Use a stale leaf for our new entry."
      in xfs_dir2_block_addname(), i.e.
      
                              memmove(&blp[mid + 1], &blp[mid],
                                      (highstale - mid) * sizeof(*blp));
      
      overwrote it.
      
      What has happened is that the previous call to xfs_dir2_block_compact()
      has rearranged things; it changes btp->count as well as the
      blp array.  So after we make that call, we must recalculate the
      proper pointer to the leaf entries by making another call to
      xfs_dir2_block_leaf_p().
      
      Dave provided a metadump image which led to a simple reproducer
      (create a particular filename in the affected directory) and this
      resolves the testcase as well as the bug on his live system.
      
      Thanks also to dchinner for looking at this one with me.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Tested-by: NDave Jones <davej@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      d4608632
  11. 04 1月, 2013 2 次提交
  12. 01 1月, 2013 1 次提交
  13. 22 12月, 2012 9 次提交
    • L
      Linux 3.8-rc1 · a49f0d1e
      Linus Torvalds 提交于
      a49f0d1e
    • L
      Merge git://www.linux-watchdog.org/linux-watchdog · 4fe19a13
      Linus Torvalds 提交于
      Pull watchdog updates from Wim Van Sebroeck:
       "This includes some fixes and code improvements (like
        clk_prepare_enable and clk_disable_unprepare), conversion from the
        omap_wdt and twl4030_wdt drivers to the watchdog framework, addition
        of the SB8x0 chipset support and the DA9055 Watchdog driver and some
        OF support for the davinci_wdt driver."
      
      * git://www.linux-watchdog.org/linux-watchdog: (22 commits)
        watchdog: mei: avoid oops in watchdog unregister code path
        watchdog: Orion: Fix possible null-deference in orion_wdt_probe
        watchdog: sp5100_tco: Add SB8x0 chipset support
        watchdog: davinci_wdt: add OF support
        watchdog: da9052: Fix invalid free of devm_ allocated data
        watchdog: twl4030_wdt: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER
        watchdog: remove depends on CONFIG_EXPERIMENTAL
        watchdog: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
        watchdog: DA9055 Watchdog driver
        watchdog: omap_wdt: eliminate goto
        watchdog: omap_wdt: delete redundant platform_set_drvdata() calls
        watchdog: omap_wdt: convert to devm_ functions
        watchdog: omap_wdt: convert to new watchdog core
        watchdog: WatchDog Timer Driver Core: fix comment
        watchdog: s3c2410_wdt: use clk_prepare_enable and clk_disable_unprepare
        watchdog: imx2_wdt: Select the driver via ARCH_MXC
        watchdog: cpu5wdt.c: add missing del_timer call
        watchdog: hpwdt.c: Increase version string
        watchdog: Convert twl4030_wdt to watchdog core
        davinci_wdt: preparation for switch to common clock framework
        ...
      4fe19a13
    • L
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 769cb858
      Linus Torvalds 提交于
      Pull CIFS fixes from Steve French:
       "Misc small cifs fixes"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: eliminate cifsERROR variable
        cifs: don't compare uniqueids in cifs_prime_dcache unless server inode numbers are in use
        cifs: fix double-free of "string" in cifs_parse_mount_options
      769cb858
    • L
      Merge tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · b49249d1
      Linus Torvalds 提交于
      Pull dm update from Alasdair G Kergon:
       "Miscellaneous device-mapper fixes, cleanups and performance
        improvements.
      
        Of particular note:
         - Disable broken WRITE SAME support in all targets except linear and
           striped.  Use it when kcopyd is zeroing blocks.
         - Remove several mempools from targets by moving the data into the
           bio's new front_pad area(which dm calls 'per_bio_data').
         - Fix a race in thin provisioning if discards are misused.
         - Prevent userspace from interfering with the ioctl parameters and
           use kmalloc for the data buffer if it's small instead of vmalloc.
         - Throttle some annoying error messages when I/O fails."
      
      * tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits)
        dm stripe: add WRITE SAME support
        dm: remove map_info
        dm snapshot: do not use map_context
        dm thin: dont use map_context
        dm raid1: dont use map_context
        dm flakey: dont use map_context
        dm raid1: rename read_record to bio_record
        dm: move target request nr to dm_target_io
        dm snapshot: use per_bio_data
        dm verity: use per_bio_data
        dm raid1: use per_bio_data
        dm: introduce per_bio_data
        dm kcopyd: add WRITE SAME support to dm_kcopyd_zero
        dm linear: add WRITE SAME support
        dm: add WRITE SAME support
        dm: prepare to support WRITE SAME
        dm ioctl: use kmalloc if possible
        dm ioctl: remove PF_MEMALLOC
        dm persistent data: improve improve space map block alloc failure message
        dm thin: use DMERR_LIMIT for errors
        ...
      b49249d1
    • J
      Revert "nfsd: warn on odd reply state in nfsd_vfs_read" · 10532b56
      J. Bruce Fields 提交于
      This reverts commit 79f77bf9.
      
      This is obviously wrong, and I have no idea how I missed seeing the
      warning in testing: I must just not have looked at the right logs.  The
      caller bumps rq_resused/rq_next_page, so it will always be hit on a
      large enough read.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10532b56
    • L
      Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · 184e2516
      Linus Torvalds 提交于
      Pull more infiniband changes from Roland Dreier:
       "Second batch of InfiniBand/RDMA changes for 3.8:
         - cxgb4 changes to fix lookup engine hash collisions
         - mlx4 changes to make flow steering usable
         - fix to IPoIB to avoid pinning dst reference for too long"
      
      * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
        RDMA/cxgb4: Fix bug for active and passive LE hash collision path
        RDMA/cxgb4: Fix LE hash collision bug for passive open connection
        RDMA/cxgb4: Fix LE hash collision bug for active open connection
        mlx4_core: Allow choosing flow steering mode
        mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
        mlx4_core: Fix error flow in the flow steering wrapper
        mlx4_core: Add QPN enforcement for flow steering rules set by VFs
        cxgb4: Add LE hash collision bug fix path in LLD driver
        cxgb4: Add T4 filter support
        IPoIB: Call skb_dst_drop() once skb is enqueued for sending
      184e2516
    • L
      Merge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic · 0264405b
      Linus Torvalds 提交于
      Pull asm-generic cleanup from Arnd Bergmann:
       "These are a few cleanups for asm-generic:
      
         - a set of patches from Lars-Peter Clausen to generalize asm/mmu.h
           and use it in the architectures that don't need any special
           handling.
         - A patch from Will Deacon to remove the {read,write}s{b,w,l} as
           discussed during the arm64 review
         - A patch from James Hogan that helps with the meta architecture
           series."
      
      * tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        xtensa: Use generic asm/mmu.h for nommu
        h8300: Use generic asm/mmu.h
        c6x: Use generic asm/mmu.h
        asm-generic/mmu.h: Add support for FDPIC
        asm-generic/mmu.h: Remove unused vmlist field from mm_context_t
        asm-generic: io: remove {read,write} string functions
        asm-generic/io.h: remove asm/cacheflush.h include
      0264405b
    • K
      ARM: dts: fix duplicated build target and alphabetical sort out for exynos · 7e65df38
      Kukjin Kim 提交于
      Commit db5b0ae0 ("Merge tag 'dt' of git://git.kernel.org/.../arm-soc")
      causes a duplicated build target.  This patch fixes it and sorts out the
      build target alphabetically so that we can recognize something wrong
      easily.
      
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NKukjin Kim <kgene.kim@samsung.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e65df38
    • M
      dm stripe: add WRITE SAME support · 45e621d4
      Mike Snitzer 提交于
      Rename stripe_map_discard to stripe_map_range and reuse it for WRITE
      SAME bio processing.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      45e621d4