1. 29 7月, 2020 5 次提交
    • D
      xfs: validate ondisk/incore dquot flags · afeda600
      Darrick J. Wong 提交于
      While loading dquot records off disk, make sure that the quota type
      flags are the same between the incore dquot and the ondisk dquot.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      afeda600
    • D
      xfs: fix inode quota reservation checks · f959b5d0
      Darrick J. Wong 提交于
      xfs_trans_dqresv is the function that we use to make reservations
      against resource quotas.  Each resource contains two counters: the
      q_core counter, which tracks resources allocated on disk; and the dquot
      reservation counter, which tracks how much of that resource has either
      been allocated or reserved by threads that are working on metadata
      updates.
      
      For disk blocks, we compare the proposed reservation counter against the
      hard and soft limits to decide if we're going to fail the operation.
      However, for inodes we inexplicably compare against the q_core counter,
      not the incore reservation count.
      
      Since the q_core counter is always lower than the reservation count and
      we unlock the dquot between reservation and transaction commit, this
      means that multiple threads can reserve the last inode count before we
      hit the hard limit, and when they commit, we'll be well over the hard
      limit.
      
      Fix this by checking against the incore inode reservation counter, since
      we would appear to maintain that correctly (and that's what we report in
      GETQUOTA).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Collins <allison.henderson@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f959b5d0
    • D
      xfs: clear XFS_DQ_FREEING if we can't lock the dquot buffer to flush · c97738a9
      Darrick J. Wong 提交于
      In commit 8d3d7e2b, we changed xfs_qm_dqpurge to bail out if we
      can't lock the dquot buf to flush the dquot.  This prevents the AIL from
      blocking on the dquot, but it also forgets to clear the FREEING flag on
      its way out.  A subsequent purge attempt will see the FREEING flag is
      set and bail out, which leads to dqpurge_all failing to purge all the
      dquots.
      
      (copy-pasting from Dave Chinner's identical patch)
      
      This was found by inspection after having xfs/305 hang 1 in ~50
      iterations in a quotaoff operation:
      
      [ 8872.301115] xfs_quota       D13888 92262  91813 0x00004002
      [ 8872.302538] Call Trace:
      [ 8872.303193]  __schedule+0x2d2/0x780
      [ 8872.304108]  ? do_raw_spin_unlock+0x57/0xd0
      [ 8872.305198]  schedule+0x6e/0xe0
      [ 8872.306021]  schedule_timeout+0x14d/0x300
      [ 8872.307060]  ? __next_timer_interrupt+0xe0/0xe0
      [ 8872.308231]  ? xfs_qm_dqusage_adjust+0x200/0x200
      [ 8872.309422]  schedule_timeout_uninterruptible+0x2a/0x30
      [ 8872.310759]  xfs_qm_dquot_walk.isra.0+0x15a/0x1b0
      [ 8872.311971]  xfs_qm_dqpurge_all+0x7f/0x90
      [ 8872.313022]  xfs_qm_scall_quotaoff+0x18d/0x2b0
      [ 8872.314163]  xfs_quota_disable+0x3a/0x60
      [ 8872.315179]  kernel_quotactl+0x7e2/0x8d0
      [ 8872.316196]  ? __do_sys_newstat+0x51/0x80
      [ 8872.317238]  __x64_sys_quotactl+0x1e/0x30
      [ 8872.318266]  do_syscall_64+0x46/0x90
      [ 8872.319193]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 8872.320490] RIP: 0033:0x7f46b5490f2a
      [ 8872.321414] Code: Bad RIP value.
      
      Returning -EAGAIN from xfs_qm_dqpurge() without clearing the
      XFS_DQ_FREEING flag means the xfs_qm_dqpurge_all() code can never
      free the dquot, and we loop forever waiting for the XFS_DQ_FREEING
      flag to go away on the dquot that leaked it via -EAGAIN.
      
      Fixes: 8d3d7e2b ("xfs: trylock underlying buffer on dquot flush")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Collins <allison.henderson@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      c97738a9
    • B
      xfs: fix inode allocation block res calculation precedence · b2a88647
      Brian Foster 提交于
      The block reservation calculation for inode allocation is supposed
      to consist of the blocks required for the inode chunk plus
      (maxlevels-1) of the inode btree multiplied by the number of inode
      btrees in the fs (2 when finobt is enabled, 1 otherwise).
      
      Instead, the macro returns (ialloc_blocks + 2) due to a precedence
      error in the calculation logic. This leads to block reservation
      overruns via generic/531 on small block filesystems with finobt
      enabled. Add braces to fix the calculation and reserve the
      appropriate number of blocks.
      
      Fixes: 9d43b180 ("xfs: update inode allocation/free transaction reservations for finobt")
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      b2a88647
    • B
      xfs: drain the buf delwri queue before xfsaild idles · f376b45e
      Brian Foster 提交于
      xfsaild is racy with respect to transaction abort and shutdown in
      that the task can idle or exit with an empty AIL but buffers still
      on the delwri queue. This was partly addressed by cancelling the
      delwri queue before the task exits to prevent memory leaks, but it's
      also possible for xfsaild to empty and idle with buffers on the
      delwri queue. For example, a transaction that pins a buffer that
      also happens to sit on the AIL delwri queue will explicitly remove
      the associated log item from the AIL if the transaction aborts. The
      side effect of this is an unmount hang in xfs_wait_buftarg() as the
      associated buffers remain held by the delwri queue indefinitely.
      This is reproduced on repeated runs of generic/531 with an fs format
      (-mrmapbt=1 -bsize=1k) that happens to also reproduce transaction
      aborts.
      
      Update xfsaild to not idle until both the AIL and associated delwri
      queue are empty and update the push code to continue delwri queue
      submission attempts even when the AIL is empty. This allows the AIL
      to eventually release aborted buffers stranded on the delwri queue
      when they are unlocked by the associated transaction. This should
      have no significant effect on normal runtime behavior because the
      xfsaild currently idles only when the AIL is empty and in practice
      the AIL is rarely empty with a populated delwri queue. The items
      must be AIL resident to land in the queue in the first place and
      generally aren't removed until writeback completes.
      
      Note that the pre-existing delwri queue cancel logic in the exit
      path is retained because task stop is external, could technically
      come at any point, and xfsaild is still responsible to release its
      buffer references before it exits.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f376b45e
  2. 18 7月, 2020 1 次提交
    • E
      xfs: preserve inode versioning across remounts · 4750a171
      Eric Sandeen 提交于
      The MS_I_VERSION mount flag is exposed via the VFS, as documented
      in the mount manpages etc; see the iversion and noiversion mount
      options in mount(8).
      
      As a result, mount -o remount looks for this option in /proc/mounts
      and will only send the I_VERSION flag back in during remount it it
      is present.  Since it's not there, a remount will /remove/ the
      I_VERSION flag at the vfs level, and iversion functionality is lost.
      
      xfs v5 superblocks intend to always have i_version enabled; it is
      set as a default at mount time, but is lost during remount for the
      reasons above.
      
      The generic fix would be to expose this documented option in
      /proc/mounts, but since that was rejected, fix it up again in the
      xfs remount path instead, so that at least xfs won't suffer from
      this misbehavior.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4750a171
  3. 14 7月, 2020 3 次提交
  4. 10 7月, 2020 1 次提交
    • W
      xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim · c3f2375b
      Waiman Long 提交于
      Depending on the workloads, the following circular locking dependency
      warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
      lock) may show up:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      5.0.0-rc1+ #60 Tainted: G        W
      ------------------------------------------------------
      fsfreeze/4346 is trying to acquire lock:
      0000000026f1d784 (fs_reclaim){+.+.}, at:
      fs_reclaim_acquire.part.19+0x5/0x30
      
      but task is already holding lock:
      0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
      
      which lock already depends on the new lock.
        :
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(sb_internal);
                                     lock(fs_reclaim);
                                     lock(sb_internal);
        lock(fs_reclaim);
      
       *** DEADLOCK ***
      
      4 locks held by fsfreeze/4346:
       #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
       #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
       #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
       #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
      
      stack backtrace:
      Call Trace:
       dump_stack+0xe0/0x19a
       print_circular_bug.isra.10.cold.34+0x2f4/0x435
       check_prev_add.constprop.19+0xca1/0x15f0
       validate_chain.isra.14+0x11af/0x3b50
       __lock_acquire+0x728/0x1200
       lock_acquire+0x269/0x5a0
       fs_reclaim_acquire.part.19+0x29/0x30
       fs_reclaim_acquire+0x19/0x20
       kmem_cache_alloc+0x3e/0x3f0
       kmem_zone_alloc+0x79/0x150
       xfs_trans_alloc+0xfa/0x9d0
       xfs_sync_sb+0x86/0x170
       xfs_log_sbcount+0x10f/0x140
       xfs_quiesce_attr+0x134/0x270
       xfs_fs_freeze+0x4a/0x70
       freeze_super+0x1af/0x290
       do_vfs_ioctl+0xedc/0x16c0
       ksys_ioctl+0x41/0x80
       __x64_sys_ioctl+0x73/0xa9
       do_syscall_64+0x18f/0xd23
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is a false positive as all the dirty pages are flushed out before
      the filesystem can be frozen.
      
      One way to avoid this splat is to add GFP_NOFS to the affected allocation
      calls by using the memalloc_nofs_save()/memalloc_nofs_restore() pair.
      This shouldn't matter unless the system is really running out of memory.
      In that particular case, the filesystem freeze operation may fail while
      it was succeeding previously.
      
      Without this patch, the command sequence below will show that the lock
      dependency chain sb_internal -> fs_reclaim exists.
      
       # fsfreeze -f /home
       # fsfreeze --unfreeze /home
       # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal
      
      After applying the patch, such sb_internal -> fs_reclaim lock dependency
      chain can no longer be found. Because of that, the locking dependency
      warning will not be shown.
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c3f2375b
  5. 07 7月, 2020 30 次提交