1. 12 7月, 2018 7 次提交
  2. 22 6月, 2018 1 次提交
    • D
      xfs: xfs_iflush_abort() can be called twice on cluster writeback failure · e53946db
      Dave Chinner 提交于
      When a corrupt inode is detected during xfs_iflush_cluster, we can
      get a shutdown ASSERT failure like this:
      
      XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork
      XFS (pmem1): Unmount and run xfs_repair
      XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c.  Return address = ffffffff814f4116
      XFS (pmem1): Corruption of in-memory data detected.  Shutting down filesystem
      XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8a88
      XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8ef9
      XFS (pmem1): Please umount the filesystem and rectify the problem(s)
      XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258
      .....
      Call Trace:
       xfs_iflush_abort+0x10a/0x110
       xfs_iflush+0xf3/0x390
       xfs_inode_item_push+0x126/0x1e0
       xfsaild+0x2c5/0x890
       kthread+0x11c/0x140
       ret_from_fork+0x24/0x30
      
      Essentially, xfs_iflush_abort() has been called twice on the
      original inode that that was flushed. This happens because the
      inode has been flushed to teh buffer successfully via
      xfs_iflush_int(), and so when another inode is detected as corrupt
      in xfs_iflush_cluster, the buffer is marked stale and EIO, and
      iodone callbacks are run on it.
      
      Running the iodone callbacks walks across the original inode and
      calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns
      to xfs_iflush(), it runs the error path for that function, and that
      calls xfs_iflush_abort() on the inode a second time, leading to the
      above assert failure as the inode is not flush locked anymore.
      
      This bug has been there a long time.
      
      The simple fix would be to just avoid calling xfs_iflush_abort() in
      xfs_iflush() if we've got a failure from xfs_iflush_cluster().
      However, xfs_iflush_cluster() has magic delwri buffer handling that
      means it may or may not have run IO completion on the buffer, and
      hence sometimes we have to call xfs_iflush_abort() from
      xfs_iflush(), and sometimes we shouldn't.
      
      After reading through all the error paths and the delwri buffer
      code, it's clear that the error handling in xfs_iflush_cluster() is
      unnecessary. If the buffer is delwri, it leaves it on the delwri
      list so that when the delwri list is submitted it sees a shutdown
      fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO
      and runs IO completion. i.e. exactly what xfs+iflush_cluster() does
      when it's not a delwri buffer. Further, marking a buffer stale
      clears the _XBF_DELWRI_Q flag on the buffer, which means when
      submission of the buffer occurs, it just skips over it and releases
      it.
      
      IOWs, the error handling in xfs_iflush_cluster doesn't need to care
      if the buffer is already on a the delwri queue or not - it just
      needs to mark the buffer stale, EIO and run completions. That means
      we can just use the easy fix for xfs_iflush() to avoid the double
      abort.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e53946db
  3. 09 6月, 2018 1 次提交
  4. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  5. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  6. 05 6月, 2018 1 次提交
  7. 16 5月, 2018 1 次提交
    • B
      xfs: factor out nodiscard helpers · 4e529339
      Brian Foster 提交于
      The changes to skip discards of speculative preallocation and
      unwritten extents introduced several new wrapper functions through
      the bunmapi -> extent free codepath to reduce churn in all of the
      associated callers. In several cases, these wrappers simply toggle a
      single flag to skip or not skip discards for the resulting blocks.
      
      The explicit _nodiscard() wrappers for such an isolated set of
      callers is a bit overkill. Kill off these wrappers and replace with
      the calls to the underlying functions in the contexts that need to
      control discard behavior. Retain the wrappers that preserve the
      original calling conventions to serve the original purpose of
      reducing code churn.
      
      This is a refactoring patch and does not change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4e529339
  8. 10 5月, 2018 7 次提交
  9. 10 4月, 2018 2 次提交
  10. 03 4月, 2018 1 次提交
  11. 30 3月, 2018 1 次提交
  12. 15 3月, 2018 1 次提交
  13. 12 3月, 2018 1 次提交
  14. 29 1月, 2018 6 次提交
  15. 13 1月, 2018 3 次提交
  16. 09 1月, 2018 1 次提交
  17. 22 12月, 2017 1 次提交
  18. 09 12月, 2017 1 次提交
    • C
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig 提交于
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2
  19. 28 11月, 2017 1 次提交
    • D
      xfs: always free inline data before resetting inode fork during ifree · 98c4f78d
      Darrick J. Wong 提交于
      In xfs_ifree, we reset the data/attr forks to extents format without
      bothering to free any inline data buffer that might still be around
      after all the blocks have been truncated off the file.  Prior to commit
      43518812 ("xfs: remove support for inlining data/extents into the
      inode fork") nobody noticed because the leftover inline data after
      truncation was small enough to fit inside the inline buffer inside the
      fork itself.
      
      However, now that we've removed the inline buffer, we /always/ have to
      free the inline data buffer or else we leak them like crazy.  This test
      was found by turning on kmemleak for generic/001 or generic/388.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      98c4f78d
  20. 17 11月, 2017 1 次提交
    • D
      xfs: fix forgotten rcu read unlock when skipping inode reclaim · 962cc1ad
      Darrick J. Wong 提交于
      In commit f2e9ad21 ("xfs: check for race with xfs_reclaim_inode"), we
      skip an inode if we're racing with freeing the inode via
      xfs_reclaim_inode, but we forgot to release the rcu read lock when
      dumping the inode, with the result that we exit to userspace with a lock
      held.  Don't do that; generic/320 with a 1k block size fails this
      very occasionally.
      
      ================================================
      WARNING: lock held when returning to user space!
      4.14.0-rc6-djwong #4 Tainted: G        W
      ------------------------------------------------
      rm/30466 is leaving the kernel with locks still held!
      1 lock held by rm/30466:
       #0:  (rcu_read_lock){....}, at: [<ffffffffa01364d3>] xfs_ifree_cluster.isra.17+0x2c3/0x6f0 [xfs]
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 30466 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x71/0x700
      Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
      CPU: 1 PID: 30466 Comm: rm Tainted: G        W       4.14.0-rc6-djwong #4
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
      task: ffff880037680000 task.stack: ffffc90001064000
      RIP: 0010:rcu_note_context_switch+0x71/0x700
      RSP: 0000:ffffc90001067e50 EFLAGS: 00010002
      RAX: 0000000000000001 RBX: ffff880037680000 RCX: ffff88003e73d200
      RDX: 0000000000000002 RSI: ffffffff819e53e9 RDI: ffffffff819f4375
      RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880062c900d0
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037680000
      R13: 0000000000000000 R14: ffffc90001067eb8 R15: ffff880037680690
      FS:  00007fa3b8ce8700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f69bf77c000 CR3: 000000002450a000 CR4: 00000000000006e0
      Call Trace:
       __schedule+0xb8/0xb10
       schedule+0x40/0x90
       exit_to_usermode_loop+0x6b/0xa0
       prepare_exit_to_usermode+0x7a/0x90
       retint_user+0x8/0x20
      RIP: 0033:0x7fa3b87fda87
      RSP: 002b:00007ffe41206568 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
      RAX: 0000000000000000 RBX: 00000000010e88c0 RCX: 00007fa3b87fda87
      RDX: 0000000000000000 RSI: 00000000010e89c8 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
      R10: 000000000000015e R11: 0000000000000246 R12: 00000000010c8060
      R13: 00007ffe41206690 R14: 0000000000000000 R15: 0000000000000000
      ---[ end trace e88f83bf0cfbd07d ]---
      
      Fixes: f2e9ad21
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      962cc1ad