1. 22 12月, 2013 1 次提交
  2. 21 12月, 2013 1 次提交
  3. 17 12月, 2013 8 次提交
    • D
      xfs: abort metadata writeback on permanent errors · ac8809f9
      Dave Chinner 提交于
      If we are doing aysnc writeback of metadata, we can get write errors
      but have nobody to report them to. At the moment, we simply attempt
      to reissue the write from io completion in the hope that it's a
      transient error.
      
      When it's not a transient error, the buffer is stuck forever in
      this loop, and we cannot break out of it. Eventually, unmount will
      hang because the AIL cannot be emptied and everything goes downhill
      from them.
      
      To solve this problem, only retry the write IO once before aborting
      it. We don't throw the buffer away because some transient errors can
      last minutes (e.g.  FC path failover) or even hours (thin
      provisioned devices that have run out of backing space) before they
      go away. Hence we really want to keep trying until we can't try any
      more.
      
      Because the buffer was not cleaned, however, it does not get removed
      from the AIL and hence the next pass across the AIL will start IO on
      it again. As such, we still get the "retry forever" semantics that
      we currently have, but we allow other access to the buffer in the
      mean time. Meanwhile the filesystem can continue to modify the
      buffer and relog it, so the IO errors won't hang the log or the
      filesystem.
      
      Now when we are pushing the AIL, we can see all these "permanent IO
      error" buffers and we can issue a warning about failures before we
      retry the IO. We can also catch these buffers when unmounting an
      issue a corruption warning, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ac8809f9
    • D
      xfs: swalloc doesn't align allocations properly · 33177f05
      Dave Chinner 提交于
      When swalloc is specified as a mount option, allocations are
      supposed to be aligned to the stripe width rather than the stripe
      unit of the underlying filesystem. However, it does not do this.
      
      What the implementation does is round up the allocation size to a
      stripe width, hence ensuring that all allocations span a full stripe
      width. It does not, however, ensure that that allocation is aligned
      to a stripe width, and hence the allocations can span multiple
      underlying stripes and so still see RMW cycles for things like
      direct IO on MD RAID.
      
      So, if the swalloc mount option is set, change the allocation
      alignment in xfs_bmap_btalloc() to use the stripe width rather than
      the stripe unit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      33177f05
    • C
      xfs: remove xfsbdstrat error · 83a0adc3
      Christoph Hellwig 提交于
      The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
      handles the case of a shut down filesystem.  Most of the users have private,
      uncached buffers that can just be freed in this case, but the complex error
      handling in xfs_bioerror_relse messes up the case when it's called without
      a locked buffer.
      
      Remove xfsbdstrat and opencode the error handling in the callers.  All but
      one can simply return an error and don't need to deal with buffer state,
      and the one caller that cares about the buffer state could do with a major
      cleanup as well, but we'll defer that to later.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      83a0adc3
    • D
      xfs: align initial file allocations correctly · 6e708bcf
      Dave Chinner 提交于
      The function xfs_bmap_isaeof() is used to indicate that an
      allocation is occurring at or past the end of file, and as such
      should be aligned to the underlying storage geometry if possible.
      
      Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the
      behaviour of this function for empty files - it turned off
      allocation alignment for this case accidentally. Hence large initial
      allocations from direct IO are not getting correctly aligned to the
      underlying geometry, and that is cause write performance to drop in
      alignment sensitive configurations.
      
      Fix it by considering allocation into empty files as requiring
      aligned allocation again.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit f9b395a8)
      6e708bcf
    • J
      xfs: fix infinite loop by detaching the group/project hints from user dquot · 718cc6f8
      Jie Liu 提交于
      xfs_quota(8) will hang up if trying to turn group/project quota off
      before the user quota is off, this could be 100% reproduced by:
        # mount -ouquota,gquota /dev/sda7 /xfs
        # mkdir /xfs/test
        # xfs_quota -xc 'off -g' /xfs <-- hangs up
        # echo w > /proc/sysrq-trigger
        # dmesg
      
        SysRq : Show Blocked State
        task                        PC stack   pid father
        xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
        [snip]
        Call Trace:
        [<ffffffff81aaa21d>] schedule+0xad/0xc0
        [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
        [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
        [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
        [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
        [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
        [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
        [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
        [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
        [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
        [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
        [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
        [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
        [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
        [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
        [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
        [<ffffffff812fba4b>] ? iput+0x5b/0x90
        [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
        [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b
      
      It's fine if we turn user quota off at first, then turn off other
      kind of quotas if they are enabled since the group/project dquot
      refcount is decreased to zero once the user quota if off. Otherwise,
      those dquots refcount is non-zero due to the user dquot might refer
      to them as hint(s).  Hence, above operation cause an infinite loop
      at xfs_qm_dquot_walk() while trying to purge dquot cache.
      
      This problem has been around since Linux 3.4, it was introduced by:
        [ b84a3a96 xfs: remove the per-filesystem list of dquots ]
      
      Originally we will release the group dquot pointers because the user
      dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
      However, with above change, there is no such work to be done before
      purging group/project dquot cache.
      
      In order to solve this problem, this patch introduces a special routine
      xfs_qm_dqpurge_hints(), and it would release the group/project dquot
      pointers the user dquots maybe carrying around as a hint, and then it
      will proceed to purge the user dquot cache if requested.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit df8052e7)
      718cc6f8
    • J
      xfs: fix assertion failure at xfs_setattr_nonsize · 5c227278
      Jie Liu 提交于
      For CRC enabled v5 super block, change a file's ownership can simply
      trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
      project quota are enabled, i.e,
      
      [  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
      [  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
      [  305.383939] Call Trace:
      [  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
      [  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
      [  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
      [  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
      [  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
      [  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
      [  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]
      
      This fix adjust the assertion to check if the super block support both
      quota inodes or not.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 5a01dd54)
      5c227278
    • J
      xfs: fix false assertion at xfs_qm_vop_create_dqattach · 30d161c9
      Jie Liu 提交于
      After the previous fix, there still has another ASSERT failure if turning
      off any type of quota while fsstress is running at the same time.
      
      Backtrace in this case:
      
      [   50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
      [   50.867924] ------------[ cut here ]------------
      ... <snip>
      [   50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
      [   50.867999] invalid opcode: 0000 [#1] SMP
      [   50.869407] Call Trace:
      [   50.869446]  [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
      [   50.869512]  [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
      [   50.869564]  [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
      [   50.869615]  [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
      [   50.869655]  [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
      [   50.869689]  [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
      [   50.869723]  [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
      [   50.869757]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [   50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
      [   50.870003] RIP  [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
      [   50.870050]  RSP <ffff88002941fd60>
      [   50.879251] ---[ end trace c93a2b342341c65b ]---
      
      We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
      however the assertion itself is not right IMHO.  While performing quota off, we
      firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
      any special locks, see xfs_qm_scall_quotaoff().  Hence there is no guarantee
      that the desired quota is still active.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 37eb9706)
      30d161c9
    • M
      xfs: fix memory leak in xfs_dir2_node_removename · 3a8c9208
      Mark Tinguely 提交于
      Fix the leak of kernel memory in xfs_dir2_node_removename()
      when xfs_dir2_leafn_remove() returns an error code.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ef701600)
      3a8c9208
  4. 14 12月, 2013 2 次提交
  5. 13 12月, 2013 2 次提交
    • J
      procfs: also fix proc_reg_get_unmapped_area() for !MMU case · ae5758a1
      Jan Beulich 提交于
      Commit fad1a86e ("procfs: call default get_unmapped_area on
      MMU-present architectures"), as its title says, took care of only the
      MMU case, leaving the !MMU side still in the regressed state (returning
      -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).
      
      From the fad1a86e changelog:
      
       "Commit c4fe2448 ("sparc: fix PCI device proc file mmap(2)") added
        proc_reg_get_unmapped_area in proc_reg_file_ops and
        proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
        get_unmapped_area method is not defined for the target procfs file, which
        causes regression of mmap on /proc/vmcore.
      
        To address this issue, like get_unmapped_area(), call default
        current->mm->get_unmapped_area on MMU-present architectures if
        pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
        in the procfs file, is not defined"
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>	[3.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae5758a1
    • W
      dcache: allow word-at-a-time name hashing with big-endian CPUs · a5c21dce
      Will Deacon 提交于
      When explicitly hashing the end of a string with the word-at-a-time
      interface, we have to be careful which end of the word we pick up.
      
      On big-endian CPUs, the upper-bits will contain the data we're after, so
      ensure we generate our masks accordingly (and avoid hashing whatever
      random junk may have been sitting after the string).
      
      This patch adds a new dcache helper, bytemask_from_count, which creates
      a mask appropriate for the CPU endianness.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5c21dce
  6. 12 12月, 2013 7 次提交
    • D
      Btrfs: fix access_ok() check in btrfs_ioctl_send() · 700ff4f0
      Dan Carpenter 提交于
      The closing parenthesis is in the wrong place.  We want to check
      "sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of
      "sizeof(*arg->clone_sources * arg->clone_sources_count)".
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      700ff4f0
    • W
      Btrfs: make sure we cleanup all reloc roots if error happens · 467bb1d2
      Wang Shilong 提交于
      I hit an oops when merging reloc roots fails, the reason is that
      new reloc roots may be added and we should make sure we cleanup
      all reloc roots.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      467bb1d2
    • W
      Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation · 66463748
      Wang Shilong 提交于
      Quota tree and UUID Tree is only cowed, they can not be snapshoted.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      66463748
    • W
      Btrfs: fix an oops when doing balance relocation · c974c464
      Wang Shilong 提交于
      I hit an oops when inserting reloc root into @reloc_root_tree(it can be
      easily triggered when forcing cow for relocation root)
      
      [  866.494539]  [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs]
      [  866.495321]  [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs]
      [  866.496109]  [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs]
      [  866.496908]  [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs]
      [  866.497703]  [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs]
      
      This is because reloc root inserted into @reloc_root_tree is not within one
      transaction,reloc root may be cowed and root block bytenr will be reused then
      oops happens.We should update reloc root in @reloc_root_tree when cow reloc
      root node, fix it.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c974c464
    • F
      Btrfs: don't miss skinny extent items on delayed ref head contention · 639eefc8
      Filipe David Borba Manana 提交于
      Currently extent-tree.c:btrfs_lookup_extent_info() can miss the lookup
      of skinny extent items. This can happen when the execution flow is the
      following:
      
      * We do an extent tree lookup and fail to find a skinny extent item;
      
      * As a result, we attempt to see if a non-skinny extent item exists,
        either by looking at previous item in the leaf or by doing another
        full extent tree search;
      
      * We have a transaction and then we check for a matching delayed ref
        head in the transaction's delayed refs rbtree;
      
      * We find such delayed ref head and then we try to lock it with a
        call to mutex_trylock();
      
      * The lock was contended so we jump to the label "again", which repeats
        the extent tree search but for a non-skinny extent item, because we set
        previously metadata variable to 0 and the search key to look for a
        non-skinny extent-item;
      
      * After the jump (and after releasing the transaction's delayed refs
        lock), a skinny extent item might have been added to the extent tree
        but we will miss it because metadata is set to 0 and the search key
        is set for a non-skinny extent-item.
      
      The fix here is to not reset metadata to 0 and to jump to the initial search
      key setup if the delayed ref head is contended, instead of jumping directly
      to the extent tree search label ("again").
      
      This issue was found while investigating the issue reported at Bugzilla 64961.
      
      David Sterba suspected this function was missing extent items, and that
      this could be caused by the last change to this function, which was made
      in the following patch:
      
          [PATCH] Btrfs: optimize btrfs_lookup_extent_info()
          (commit 74be9510)
      
      But in fact this issue already existed before, because after failing to find
      a skinny extent item, the code set the search key for a non-skinny extent
      item, and on contention of a matching delayed ref head it would not search
      the extent tree for a skinny extent item anymore.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      639eefc8
    • D
      btrfs: call mnt_drop_write after interrupted subvol deletion · e43f998e
      David Sterba 提交于
      If btrfs_ioctl_snap_destroy blocks on the mutex and the process is
      killed, mnt_write count is unbalanced and leads to unmountable
      filesystem.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      e43f998e
    • M
      Btrfs: don't clear the default compression type · a7e252af
      Miao Xie 提交于
      We met a oops caused by the wrong compression type:
      [  556.512356] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  556.512370] IP: [<ffffffff811dbaa0>] __list_del_entry+0x1/0x98
      [SNIP]
      [  556.512490]  [<ffffffff811dbb44>] ? list_del+0xd/0x2b
      [  556.512539]  [<ffffffffa05dd5ce>] find_workspace+0x97/0x175 [btrfs]
      [  556.512546]  [<ffffffff813c14b5>] ? _raw_spin_lock+0xe/0x10
      [  556.512576]  [<ffffffffa05de276>] btrfs_compress_pages+0x2d/0xa2 [btrfs]
      [  556.512601]  [<ffffffffa05af060>] compress_file_range.constprop.54+0x1f2/0x4e8 [btrfs]
      [  556.512627]  [<ffffffffa05af388>] async_cow_start+0x32/0x4d [btrfs]
      [  556.512655]  [<ffffffffa05cc7a1>] worker_loop+0x144/0x4c3 [btrfs]
      [  556.512661]  [<ffffffff81059404>] ? finish_task_switch+0x80/0xb8
      [  556.512689]  [<ffffffffa05cc65d>] ? btrfs_queue_worker+0x244/0x244 [btrfs]
      [  556.512695]  [<ffffffff8104fa4e>] kthread+0x8d/0x95
      [  556.512699]  [<ffffffff81050000>] ? bit_waitqueue+0x34/0x7d
      [  556.512704]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65
      [  556.512709]  [<ffffffff813c7eec>] ret_from_fork+0x7c/0xb0
      [  556.512713]  [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65
      
      Steps to reproduce:
       # mkfs.btrfs -f <dev>
       # mount -o nodatacow <dev> <mnt>
       # touch <mnt>/<file>
       # chattr =c <mnt>/<file>
       # dd if=/dev/zero of=<mnt>/<file> bs=1M count=10
      
      It is because we cleared the default compression type when setting the
      nodatacow. In fact, we needn't do it because we have used COMPRESS flag to
      indicate if we need compressed the file data or not, needn't use the
      variant -- compress_type -- in btrfs_info to do the same thing, and just
      use it to hold the default compression type. Or we would get a wrong compress
      type for a file whose own compress flag is set but the compress flag of its
      filesystem is not set.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a7e252af
  7. 11 12月, 2013 3 次提交
    • J
      nfsd: when reusing an existing repcache entry, unhash it first · 781c2a5a
      Jeff Layton 提交于
      The DRC code will attempt to reuse an existing, expired cache entry in
      preference to allocating a new one. It'll then search the cache, and if
      it gets a hit it'll then free the cache entry that it was going to
      reuse.
      
      The cache code doesn't unhash the entry that it's going to reuse
      however, so it's possible for it end up designating an entry for reuse
      and then subsequently freeing the same entry after it finds it.  This
      leads it to a later use-after-free situation and usually some list
      corruption warnings or an oops.
      
      Fix this by simply unhashing the entry that we intend to reuse. That
      will mean that it's not findable via a search and should prevent this
      situation from occurring.
      
      Cc: stable@vger.kernel.org # v3.10+
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Reported-by: Ng. artim <gartim@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      781c2a5a
    • D
      xfs: growfs overruns AGFL buffer on V4 filesystems · f94c4457
      Dave Chinner 提交于
      This loop in xfs_growfs_data_private() is incorrect for V4
      superblocks filesystems:
      
      		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
      			agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
      
      For V4 filesystems, we don't have a agfl header structure, and so
      XFS_AGFL_SIZE() returns an entire sector's worth of entries, which
      we then index from an offset into the sector. Hence: buffer overrun.
      
      This problem was introduced in 3.10 by commit 77c95bba ("xfs: add
      CRC checks to the AGFL") which changed the AGFL structure but failed
      to update the growfs code to handle the different structures.
      
      Fix it by using the correct offset into the buffer for both V4 and
      V5 filesystems.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit b7d961b3)
      f94c4457
    • J
      xfs: don't perform discard if the given range length is less than block size · 2f42d612
      Jie Liu 提交于
      For discard operation, we should return EINVAL if the given range length
      is less than a block size, otherwise it will go through the file system
      to discard data blocks as the end range might be evaluated to -1, e.g,
      # fstrim -v -o 0 -l 100 /xfs7
      /xfs7: 9811378176 bytes were trimmed
      
      This issue can be triggered via xfstests/generic/288.
      
      Also, it seems to get the request queue pointer via bdev_get_queue()
      instead of the hard code pointer dereference is not a bad thing.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit f9fd0135)
      2f42d612
  8. 10 12月, 2013 1 次提交
  9. 08 12月, 2013 1 次提交
    • T
      sysfs: give different locking key to regular and bin files · a8b14744
      Tejun Heo 提交于
      027a485d ("sysfs: use a separate locking class for open files
      depending on mmap") assigned different lockdep key to
      sysfs_open_file->mutex depending on whether the file implements mmap
      or not in an attempt to avoid spurious lockdep warning caused by
      merging of regular and bin file paths.
      
      While this restored some of the original behavior of using different
      locks (at least lockdep is concerned) for the different clases of
      files.  The restoration wasn't full because now the lockdep key
      assignment depends on whether the file has mmap or not instead of
      whether it's a regular file or not.
      
      This means that bin files which don't implement mmap will get assigned
      the same lockdep class as regular files.  This is problematic because
      file_operations for bin files still implements the mmap file operation
      and checking whether the sysfs file actually implements mmap happens
      in the file operation after grabbing @sysfs_open_file->mutex.  We
      still end up adding locking dependency from mmap locking to
      sysfs_open_file->mutex to the regular file mutex which triggers
      spurious circular locking warning.
      
      Fix it by restoring the original behavior fully by differentiating
      lockdep key by whether the file is regular or bin, instead of the
      existence of mmap.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8b14744
  10. 06 12月, 2013 1 次提交
  11. 05 12月, 2013 2 次提交
    • H
      nfs: fix do_div() warning by instead using sector_div() · 3873d064
      Helge Deller 提交于
      When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
      shown below.  Fix this warning by instead using sector_div() which is provided
      by the kernel.h header file.
      
      fs/nfs/blocklayout/extents.c: In function ‘normalize’:
      include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
      nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
      fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
      include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
       extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3873d064
    • T
      NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery · f22e5edd
      Trond Myklebust 提交于
      Andy Adamson reports:
      
      The state manager is recovering expired state and recovery OPENs are being
      processed. If kswapd is pruning inodes at the same time, a deadlock can occur
      when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
      resultant layoutreturn gets an error that the state mangager is to handle,
      causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
      
      At the same time an open is waiting for the inode deletion to complete in
      __wait_on_freeing_inode.
      
      If the open is either the open called by the state manager, or an open from
      the same open owner that is holding the NFSv4 sequence id which causes the
      OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
      then the state is deadlocked with kswapd.
      
      The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
      We already know that layouts are dropped on all server reboots, and that
      it has to be coded to deal with the "forgetful client model" that doesn't
      send layoutreturns.
      Reported-by: NAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.comSigned-off-by: NTrond Myklebust <Trond.Myklebust@primarydata.com>
      f22e5edd
  12. 03 12月, 2013 2 次提交
    • A
      epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled · 95f19f65
      Amit Pundir 提交于
      Drop EPOLLWAKEUP from epoll events mask if CONFIG_PM_SLEEP is disabled.
      Signed-off-by: NAmit Pundir <amit.pundir@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      95f19f65
    • L
      vfs: fix subtle use-after-free of pipe_inode_info · b0d8d229
      Linus Torvalds 提交于
      The pipe code was trying (and failing) to be very careful about freeing
      the pipe info only after the last access, with a pattern like:
      
              spin_lock(&inode->i_lock);
              if (!--pipe->files) {
                      inode->i_pipe = NULL;
                      kill = 1;
              }
              spin_unlock(&inode->i_lock);
              __pipe_unlock(pipe);
              if (kill)
                      free_pipe_info(pipe);
      
      where the final freeing is done last.
      
      HOWEVER.  The above is actually broken, because while the freeing is
      done at the end, if we have two racing processes releasing the pipe
      inode info, the one that *doesn't* free it will decrement the ->files
      count, and unlock the inode i_lock, but then still use the
      "pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
      
      This is *very* hard to trigger in practice, since the race window is
      very small, and adding debug options seems to just hide it by slowing
      things down.
      
      Simon originally reported this way back in July as an Oops in
      kmem_cache_allocate due to a single bit corruption (due to the final
      "spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
      allocation that had re-used the free'd pipe-info), it's taken this long
      to figure out.
      
      Since the 'pipe->files' accesses aren't even protected by the pipe lock
      (we very much use the inode lock for that), the simple solution is to
      just drop the pipe lock early.  And since there were two users of this
      pattern, create a helper function for it.
      
      Introduced commit ba5bb147 ("pipe: take allocation and freeing of
      pipe_inode_info out of ->i_mutex").
      Reported-by: NSimon Kirby <sim@hostway.ca>
      Reported-by: NIan Applegate <ia@cloudflare.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org   # v3.10+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0d8d229
  13. 29 11月, 2013 1 次提交
  14. 28 11月, 2013 2 次提交
  15. 25 11月, 2013 2 次提交
    • S
      [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload · f19e84df
      Steve French 提交于
      Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
      of BTRFS_IOC_CLONE to avoid confusion about whether
      copy-on-write is required or optional for this operation.
      
      SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
      they both speed up copy by offloading the copy rather than
      passing many read and write requests back and forth and both have
      identical syntax (passing file handles), but for SMB2/SMB3
      CopyChunk the server is not required to use copy-on-write
      to make a copy of the file (although some do), and Christoph
      has commented that since CopyChunk does not require
      copy-on-write we should not reuse BTRFS_IOC_CLONE.
      
      This patch renames the ioctl to use a cifs specific IOCTL
      CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
      for SMB2/SMB3 since large file copy over the network otherwise
      can be very slow, and with this is often more than 100 times
      faster putting less load on server and client.
      
      Note that if a copy syscall is ever introduced, depending on
      its requirements/format it could end up using one of the other
      three methods that CIFS/SMB2/SMB3 can do for copy offload,
      but this method is particularly useful for file copy
      and broadly supported (not just by Samba server).
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NDavid Disseldorp <ddiss@samba.org>
      f19e84df
    • K
      block: submit_bio_wait() conversions · c170bbb4
      Kent Overstreet 提交于
      It was being open coded in a few places.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c170bbb4
  16. 24 11月, 2013 4 次提交