1. 06 9月, 2019 3 次提交
    • R
      xfs: fix missed wakeup on l_flush_wait · cdea5459
      Rik van Riel 提交于
      The code in xlog_wait uses the spinlock to make adding the task to
      the wait queue, and setting the task state to UNINTERRUPTIBLE atomic
      with respect to the waker.
      
      Doing the wakeup after releasing the spinlock opens up the following
      race condition:
      
      Task 1					task 2
      add task to wait queue
      					wake up task
      set task state to UNINTERRUPTIBLE
      
      This issue was found through code inspection as a result of kworkers
      being observed stuck in UNINTERRUPTIBLE state with an empty
      wait queue. It is rare and largely unreproducable.
      
      Simply moving the spin_unlock to after the wake_up_all results
      in the waker not being able to see a task on the waitqueue before
      it has set its state to UNINTERRUPTIBLE.
      
      This bug dates back to the conversion of this code to generic
      waitqueue infrastructure from a counting semaphore back in 2008
      which didn't place the wakeups consistently w.r.t. to the relevant
      spin locks.
      
      [dchinner: Also fix a similar issue in the shutdown path on
      xc_commit_wait. Update commit log with more details of the issue.]
      
      Fixes: d748c623 ("[XFS] Convert l_flushsema to a sv_t")
      Reported-by: NChris Mason <clm@fb.com>
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      cdea5459
    • D
      xfs: push the AIL in xlog_grant_head_wake · 7c107afb
      Dave Chinner 提交于
      In the situation where the log is full and the CIL has not recently
      flushed, the AIL push threshold is throttled back to the where the
      last write of the head of the log was completed. This is stored in
      log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space
      pinned by flushes and/or aggregation in progress, we can get the
      situation where the head of the log lags a long way behind the
      reservation grant head.
      
      When this happens, the AIL push target is trimmed back from where
      the reservation grant head wants to push the log tail to, back to
      where the head of the log currently is. This means the push target
      doesn't reach far enough into the log to actually move the tail
      before the transaction reservation goes to sleep.
      
      When the CIL push completes, it moves the log head forward such that
      the AIL push target can now be moved, but that has no mechanism for
      puhsing the log tail. Further, if the next tail movement of the log
      is not large enough wake the waiter (i.e. still not enough space for
      it to have a reservation granted), we don't wake anything up, and
      hence we do not update the AIL push target to take into account the
      head of the log moving and allowing the push target to be moved
      forwards.
      
      To avoid this particular condition, if we fail to wake the first
      waiter on the grant head because we don't have enough space,
      push on the AIL again. This will pick up any movement of the log
      head and allow the push target to move forward due to completion of
      CIL pushing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7c107afb
    • A
      xfs: Use WARN_ON_ONCE for bailout mount-operation · eb2e9994
      Austin Kim 提交于
      If the CONFIG_BUG is enabled, BUG is executed and then system is crashed.
      However, the bailout for mount is no longer proceeding.
      
      Using WARN_ON_ONCE rather than BUG can prevent this situation.
      Signed-off-by: NAustin Kim <austindh.kim@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      eb2e9994
  2. 04 9月, 2019 2 次提交
    • K
      xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUT · bc56ad8c
      kaixuxia 提交于
      When performing rename operation with RENAME_WHITEOUT flag, we will
      hold AGF lock to allocate or free extents in manipulating the dirents
      firstly, and then doing the xfs_iunlink_remove() call last to hold
      AGI lock to modify the tmpfile info, so we the lock order AGI->AGF.
      
      The big problem here is that we have an ordering constraint on AGF
      and AGI locking - inode allocation locks the AGI, then can allocate
      a new extent for new inodes, locking the AGF after the AGI. Hence
      the ordering that is imposed by other parts of the code is AGI before
      AGF. So we get an ABBA deadlock between the AGI and AGF here.
      
      Process A:
      Call trace:
       ? __schedule+0x2bd/0x620
       schedule+0x33/0x90
       schedule_timeout+0x17d/0x290
       __down_common+0xef/0x125
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       down+0x3b/0x50
       xfs_buf_lock+0x34/0xf0 [xfs]
       xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_buf_get_map+0x37/0x230 [xfs]
       xfs_buf_read_map+0x29/0x190 [xfs]
       xfs_trans_read_buf_map+0x13d/0x520 [xfs]
       xfs_read_agf+0xa6/0x180 [xfs]
       ? schedule_timeout+0x17d/0x290
       xfs_alloc_read_agf+0x52/0x1f0 [xfs]
       xfs_alloc_fix_freelist+0x432/0x590 [xfs]
       ? down+0x3b/0x50
       ? xfs_buf_lock+0x34/0xf0 [xfs]
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_alloc_vextent+0x301/0x6c0 [xfs]
       xfs_ialloc_ag_alloc+0x182/0x700 [xfs]
       ? _xfs_trans_bjoin+0x72/0xf0 [xfs]
       xfs_dialloc+0x116/0x290 [xfs]
       xfs_ialloc+0x6d/0x5e0 [xfs]
       ? xfs_log_reserve+0x165/0x280 [xfs]
       xfs_dir_ialloc+0x8c/0x240 [xfs]
       xfs_create+0x35a/0x610 [xfs]
       xfs_generic_create+0x1f1/0x2f0 [xfs]
       ...
      
      Process B:
      Call trace:
       ? __schedule+0x2bd/0x620
       ? xfs_bmapi_allocate+0x245/0x380 [xfs]
       schedule+0x33/0x90
       schedule_timeout+0x17d/0x290
       ? xfs_buf_find+0x1fd/0x6c0 [xfs]
       __down_common+0xef/0x125
       ? xfs_buf_get_map+0x37/0x230 [xfs]
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       down+0x3b/0x50
       xfs_buf_lock+0x34/0xf0 [xfs]
       xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_buf_get_map+0x37/0x230 [xfs]
       xfs_buf_read_map+0x29/0x190 [xfs]
       xfs_trans_read_buf_map+0x13d/0x520 [xfs]
       xfs_read_agi+0xa8/0x160 [xfs]
       xfs_iunlink_remove+0x6f/0x2a0 [xfs]
       ? current_time+0x46/0x80
       ? xfs_trans_ichgtime+0x39/0xb0 [xfs]
       xfs_rename+0x57a/0xae0 [xfs]
       xfs_vn_rename+0xe4/0x150 [xfs]
       ...
      
      In this patch we move the xfs_iunlink_remove() call to
      before acquiring the AGF lock to preserve correct AGI/AGF locking
      order.
      Signed-off-by: Nkaixuxia <kaixuxia@tencent.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      bc56ad8c
    • D
      xfs: define a flags field for the AG geometry ioctl structure · 76f17933
      Darrick J. Wong 提交于
      Define a flags field for the AG geometry ioctl structure.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      76f17933
  3. 03 9月, 2019 1 次提交
  4. 31 8月, 2019 15 次提交
  5. 30 8月, 2019 1 次提交
  6. 28 8月, 2019 9 次提交
  7. 27 8月, 2019 6 次提交
  8. 23 8月, 2019 1 次提交
    • D
      xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT · 1fb254aa
      Darrick J. Wong 提交于
      Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
      fails on account of being out of disk quota.  I ran his reproducer
      script:
      
      # adduser dummy
      # adduser dummy plugdev
      
      # dd if=/dev/zero bs=1M count=100 of=test.img
      # mkfs.xfs test.img
      # mount -t xfs -o gquota test.img /mnt
      # mkdir -p /mnt/dummy
      # chown -c dummy /mnt/dummy
      # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
      
      (and then as user dummy)
      
      $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
      $ chgrp plugdev /mnt/dummy/foo
      
      and saw:
      
      ================================================
      WARNING: lock held when returning to user space!
      5.3.0-rc5 #rc5 Tainted: G        W
      ------------------------------------------------
      chgrp/47006 is leaving the kernel with locks still held!
      1 lock held by chgrp/47006:
       #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
      
      ...which is clearly caused by xfs_setattr_nonsize failing to unlock the
      ILOCK after the xfs_qm_vop_chown_reserve call fails.  Add the missing
      unlock.
      
      Reported-by: benjamin.moody@gmail.com
      Fixes: 253f4911 ("xfs: better xfs_trans_alloc interface")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NSalvatore Bonaccorso <carnil@debian.org>
      1fb254aa
  9. 20 8月, 2019 1 次提交
  10. 19 8月, 2019 1 次提交
    • D
      xfs: fix reflink source file racing with directio writes · 5d888b48
      Darrick J. Wong 提交于
      While trawling through the dedupe file comparison code trying to fix
      page deadlocking problems, Dave Chinner noticed that the reflink code
      only takes shared IOLOCK/MMAPLOCKs on the source file.  Because
      page_mkwrite and directio writes do not take the EXCL versions of those
      locks, this means that reflink can race with writer processes.
      
      For pure remapping this can lead to undefined behavior and file
      corruption; for dedupe this means that we cannot be sure that the
      contents are identical when we decide to go ahead with the remapping.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5d888b48