1. 16 6月, 2020 5 次提交
    • B
      xfs: acquire superblock freeze protection on eofblocks scans · 29c0b559
      Brian Foster 提交于
      task #28557760
      
      commit 4b674b9ac852937af1f8c62f730c325fb6eadcdb upstream.
      
      The filesystem freeze sequence in XFS waits on any background
      eofblocks or cowblocks scans to complete before the filesystem is
      quiesced. At this point, the freezer has already stopped the
      transaction subsystem, however, which means a truncate or cowblock
      cancellation in progress is likely blocked in transaction
      allocation. This results in a deadlock between freeze and the
      associated scanner.
      
      Fix this problem by holding superblock write protection across calls
      into the block reapers. Since protection for background scans is
      acquired from the workqueue task context, trylock to avoid a similar
      deadlock between freeze and blocking on the write lock.
      
      Fixes: d6b636eb ("xfs: halt auto-reclamation activities while rebuilding rmap")
      Reported-by: NPaul Furtado <paulfurtado91@gmail.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAllison Collins <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      29c0b559
    • K
      xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUT · d637cc8f
      kaixuxia 提交于
      task #28557760
      
      commit bc56ad8c74b8588685c2875de0df8ab6974828ef upstream.
      
      When performing rename operation with RENAME_WHITEOUT flag, we will
      hold AGF lock to allocate or free extents in manipulating the dirents
      firstly, and then doing the xfs_iunlink_remove() call last to hold
      AGI lock to modify the tmpfile info, so we the lock order AGI->AGF.
      
      The big problem here is that we have an ordering constraint on AGF
      and AGI locking - inode allocation locks the AGI, then can allocate
      a new extent for new inodes, locking the AGF after the AGI. Hence
      the ordering that is imposed by other parts of the code is AGI before
      AGF. So we get an ABBA deadlock between the AGI and AGF here.
      
      Process A:
      Call trace:
       ? __schedule+0x2bd/0x620
       schedule+0x33/0x90
       schedule_timeout+0x17d/0x290
       __down_common+0xef/0x125
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       down+0x3b/0x50
       xfs_buf_lock+0x34/0xf0 [xfs]
       xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_buf_get_map+0x37/0x230 [xfs]
       xfs_buf_read_map+0x29/0x190 [xfs]
       xfs_trans_read_buf_map+0x13d/0x520 [xfs]
       xfs_read_agf+0xa6/0x180 [xfs]
       ? schedule_timeout+0x17d/0x290
       xfs_alloc_read_agf+0x52/0x1f0 [xfs]
       xfs_alloc_fix_freelist+0x432/0x590 [xfs]
       ? down+0x3b/0x50
       ? xfs_buf_lock+0x34/0xf0 [xfs]
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_alloc_vextent+0x301/0x6c0 [xfs]
       xfs_ialloc_ag_alloc+0x182/0x700 [xfs]
       ? _xfs_trans_bjoin+0x72/0xf0 [xfs]
       xfs_dialloc+0x116/0x290 [xfs]
       xfs_ialloc+0x6d/0x5e0 [xfs]
       ? xfs_log_reserve+0x165/0x280 [xfs]
       xfs_dir_ialloc+0x8c/0x240 [xfs]
       xfs_create+0x35a/0x610 [xfs]
       xfs_generic_create+0x1f1/0x2f0 [xfs]
       ...
      
      Process B:
      Call trace:
       ? __schedule+0x2bd/0x620
       ? xfs_bmapi_allocate+0x245/0x380 [xfs]
       schedule+0x33/0x90
       schedule_timeout+0x17d/0x290
       ? xfs_buf_find+0x1fd/0x6c0 [xfs]
       __down_common+0xef/0x125
       ? xfs_buf_get_map+0x37/0x230 [xfs]
       ? xfs_buf_find+0x215/0x6c0 [xfs]
       down+0x3b/0x50
       xfs_buf_lock+0x34/0xf0 [xfs]
       xfs_buf_find+0x215/0x6c0 [xfs]
       xfs_buf_get_map+0x37/0x230 [xfs]
       xfs_buf_read_map+0x29/0x190 [xfs]
       xfs_trans_read_buf_map+0x13d/0x520 [xfs]
       xfs_read_agi+0xa8/0x160 [xfs]
       xfs_iunlink_remove+0x6f/0x2a0 [xfs]
       ? current_time+0x46/0x80
       ? xfs_trans_ichgtime+0x39/0xb0 [xfs]
       xfs_rename+0x57a/0xae0 [xfs]
       xfs_vn_rename+0xe4/0x150 [xfs]
       ...
      
      In this patch we move the xfs_iunlink_remove() call to
      before acquiring the AGF lock to preserve correct AGI/AGF locking
      order.
      
      [Minor massage required due to upstream change making xfs_bumplink() a
      void function where as in the 4.19.y tree the return value is checked,
      even though it is always zero. Only change was to the last code block
      removed by the patch. Functionally equivalent to upstream.]
      Signed-off-by: Nkaixuxia <kaixuxia@tencent.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSuraj Jitindar Singh <surajjs@amazon.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      d637cc8f
    • D
      xfs: periodically yield scrub threads to the scheduler · 8718f7b4
      Darrick J. Wong 提交于
      task #28557760
      
      [ Upstream commit 5d1116d4c6af3e580f1ed0382ca5a94bd65a34cf ]
      
      Christoph Hellwig complained about the following soft lockup warning
      when running scrub after generic/175 when preemption is disabled and
      slub debugging is enabled:
      
      watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_scrub:161]
      Modules linked in:
      irq event stamp: 41692326
      hardirqs last  enabled at (41692325): [<ffffffff8232c3b7>] _raw_0
      hardirqs last disabled at (41692326): [<ffffffff81001c5a>] trace0
      softirqs last  enabled at (41684994): [<ffffffff8260031f>] __do_e
      softirqs last disabled at (41684987): [<ffffffff81127d8c>] irq_e0
      CPU: 3 PID: 16189 Comm: xfs_scrub Not tainted 5.4.0-rc3+ #30
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
      RIP: 0010:_raw_spin_unlock_irqrestore+0x39/0x40
      Code: 89 f3 be 01 00 00 00 e8 d5 3a e5 fe 48 89 ef e8 ed 87 e5 f2
      RSP: 0018:ffffc9000233f970 EFLAGS: 00000286 ORIG_RAX: ffffffffff3
      RAX: ffff88813b398040 RBX: 0000000000000286 RCX: 0000000000000006
      RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
      RBP: ffff888137958640 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00042b0c00
      R13: 0000000000000001 R14: ffff88810ac32308 R15: ffff8881376fc040
      FS:  00007f6113dea700(0000) GS:ffff88813bb80000(0000) knlGS:00000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6113de8ff8 CR3: 000000012f290000 CR4: 00000000000006e0
      Call Trace:
       free_debug_processing+0x1dd/0x240
       __slab_free+0x231/0x410
       kmem_cache_free+0x30e/0x360
       xchk_ag_btcur_free+0x76/0xb0
       xchk_ag_free+0x10/0x80
       xchk_bmap_iextent_xref.isra.14+0xd9/0x120
       xchk_bmap_iextent+0x187/0x210
       xchk_bmap+0x2e0/0x3b0
       xfs_scrub_metadata+0x2e7/0x500
       xfs_ioc_scrub_metadata+0x4a/0xa0
       xfs_file_ioctl+0x58a/0xcd0
       do_vfs_ioctl+0xa0/0x6f0
       ksys_ioctl+0x5b/0x90
       __x64_sys_ioctl+0x11/0x20
       do_syscall_64+0x4b/0x1a0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      If preemption is disabled, all metadata buffers needed to perform the
      scrub are already in memory, and there are a lot of records to check,
      it's possible that the scrub thread will run for an extended period of
      time without sleeping for IO or any other reason.  Then the watchdog
      timer or the RCU stall timeout can trigger, producing the backtrace
      above.
      
      To fix this problem, call cond_resched() from the scrub thread so that
      we back out to the scheduler whenever necessary.
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      8718f7b4
    • O
      xfs: don't check for AG deadlock for realtime files in bunmapi · 1bb971df
      Omar Sandoval 提交于
      task #28557760
      
      commit 69ffe5960df16938bccfe1b65382af0b3de51265 upstream.
      
      Commit 5b094d6d ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
      a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
      in the wrong order. However, this check isn't applicable for realtime
      files. In most cases, it just makes us do unnecessary commits. However,
      without the fix from the previous commit ("xfs: fix realtime file data
      space leak"), if the last and second-to-last extents also happen to have
      different "AG numbers", then the break actually causes __xfs_bunmapi()
      to return without making any progress, which sends
      xfs_itruncate_extents_flags() into an infinite loop.
      
      Fixes: 5b094d6d ("xfs: fix multi-AG deadlock in xfs_bunmapi")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      1bb971df
    • B
      xfs: fix mount failure crash on invalid iclog memory access · e1a4d741
      Brian Foster 提交于
      task #28557760
      
      [ Upstream commit 798a9cada4694ca8d970259f216cec47e675bfd5 ]
      
      syzbot (via KASAN) reports a use-after-free in the error path of
      xlog_alloc_log(). Specifically, the iclog freeing loop doesn't
      handle the case of a fully initialized ->l_iclog linked list.
      Instead, it assumes that the list is partially constructed and NULL
      terminated.
      
      This bug manifested because there was no possible error scenario
      after iclog list setup when the original code was added.  Subsequent
      code and associated error conditions were added some time later,
      while the original error handling code was never updated. Fix up the
      error loop to terminate either on a NULL iclog or reaching the end
      of the list.
      
      Reported-by: syzbot+c732f8644185de340492@syzkaller.appspotmail.com
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      e1a4d741
  2. 30 4月, 2020 1 次提交
  3. 18 3月, 2020 3 次提交
  4. 17 1月, 2020 9 次提交
  5. 02 1月, 2020 1 次提交
    • M
      block: fix .bi_size overflow · 842ed2ab
      Ming Lei 提交于
      commit 79d08f89bb1b5c2c1ff90d9bb95497ab9e8aa7e0 upstream
      
      'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
      bytes.
      
      Before 07173c3ec276 ("block: enable multipage bvecs"), one bio can
      include very limited pages, and usually at most 256, so the fs bio
      size won't be bigger than 1M bytes most of times.
      
      Since we support multi-page bvec, in theory one fs bio really can
      be added > 1M pages, especially in case of hugepage, or big writeback
      with too many dirty pages. Then there is chance in which .bi_size
      is overflowed.
      
      Fixes this issue by using bio_full() to check if the added segment may
      overflow .bi_size.
      Signed-off-by: NHui Zhu <teawaterz@linux.alibaba.com>
      Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
      Cc: kernel test robot <rong.a.chen@intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: linux-xfs@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3ec276 ("block: enable multipage bvecs")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      842ed2ab
  6. 13 12月, 2019 2 次提交
  7. 05 12月, 2019 5 次提交
    • B
      xfs: end sync buffer I/O properly on shutdown error · fe685954
      Brian Foster 提交于
      [ Upstream commit 465fa17f4a303d9fdff9eac4d45f91ece92e96ca ]
      
      As of commit e339dd8d ("xfs: use sync buffer I/O for sync delwri
      queue submission"), the delwri submission code uses sync buffer I/O
      for sync delwri I/O. Instead of waiting on async I/O to unlock the
      buffer, it uses the underlying sync I/O completion mechanism.
      
      If delwri buffer submission fails due to a shutdown scenario, an
      error is set on the buffer and buffer completion never occurs. This
      can cause xfs_buf_delwri_submit() to deadlock waiting on a
      completion event.
      
      We could check the error state before waiting on such buffers, but
      that doesn't serialize against the case of an error set via a racing
      I/O completion. Instead, invoke I/O completion in the shutdown case
      regardless of buffer I/O type.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fe685954
    • N
      xfs: Fix bulkstat compat ioctls on x32 userspace. · 68cb344c
      Nick Bowler 提交于
      [ Upstream commit 7ca860e3c1a74ad6bd8949364073ef1044cad758 ]
      
      The bulkstat family of ioctls are problematic on x32, because there is
      a mixup of native 32-bit and 64-bit conventions.  The xfs_fsop_bulkreq
      struct contains pointers and 32-bit integers so that matches the native
      32-bit layout, and that means the ioctl implementation goes into the
      regular compat path on x32.
      
      However, the 'ubuffer' member of that struct in turn refers to either
      struct xfs_inogrp or xfs_bstat (or an array of these).  On x32, those
      structures match the native 64-bit layout.  The compat implementation
      writes out the 32-bit version of these structures.  This is not the
      expected format for x32 userspace, causing problems.
      
      Fortunately the functions which actually output these xfs_inogrp and
      xfs_bstat structures have an easy way to select which output format
      is required, so we just need a little tweak to select the right format
      on x32.
      Signed-off-by: NNick Bowler <nbowler@draconx.ca>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      68cb344c
    • N
      xfs: Align compat attrlist_by_handle with native implementation. · 5d8d2116
      Nick Bowler 提交于
      [ Upstream commit c456d64449efe37da50832b63d91652a85ea1d20 ]
      
      While inspecting the ioctl implementations, I noticed that the compat
      implementation of XFS_IOC_ATTRLIST_BY_HANDLE does not do exactly the
      same thing as the native implementation.  Specifically, the "cursor"
      does not appear to be written out to userspace on the compat path,
      like it is on the native path.
      
      This adjusts the compat implementation to copy out the cursor just
      like the native implementation does.  The attrlist cursor does not
      require any special compat handling.  This fixes xfstests xfs/269
      on both IA-32 and x32 userspace, when running on an amd64 kernel.
      Signed-off-by: NNick Bowler <nbowler@draconx.ca>
      Fixes: 0facef7f ("xfs: in _attrlist_by_handle, copy the cursor back to userspace")
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5d8d2116
    • D
      xfs: require both realtime inodes to mount · 2f99d478
      Darrick J. Wong 提交于
      [ Upstream commit 64bafd2f1e484e27071e7584642005d56516cb77 ]
      
      Since mkfs always formats the filesystem with the realtime bitmap and
      summary inodes immediately after the root directory, we should expect
      that both of them are present and loadable, even if there isn't a
      realtime volume attached.  There's no reason to skip this if rbmino ==
      NULLFSINO; in fact, this causes an immediate crash if the there /is/ a
      realtime volume and someone writes to it.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2f99d478
    • D
      xfs: zero length symlinks are not valid · 22227437
      Dave Chinner 提交于
      [ Upstream commit 43feeea88c9cb2955b9f7ba8152ec5abeea42810 ]
      
      A log recovery failure has been reproduced where a symlink inode has
      a zero length in extent form. It was caused by a shutdown during a
      combined fstress+fsmark workload.
      
      The underlying problem is the issue in xfs_inactive_symlink(): the
      inode is unlocked between the symlink inactivation/truncation and
      the inode being freed. This opens a window for the inode to be
      written to disk before it xfs_ifree() removes it from the unlinked
      list, marks it free in the inobt and zeros the mode.
      
      For shortform inodes, the fix is simple. xfs_ifree() clears the data
      fork state, so there's no need to do it in xfs_inactive_symlink().
      This means the shortform fork verifier will not see a zero length
      data fork as it mirrors the inode size through to xfs_ifree()), and
      hence if the inode gets written back and the fork verifiers are run
      they will still see a fork that matches the on-disk inode size.
      
      For extent form (remote) symlinks, it is a little more tricky. Here
      we explicitly set the inode size to zero, so the above race can lead
      to zero length symlinks on disk. Because the inode is unlinked at
      this point (i.e. on the unlinked list) and unreferenced, it can
      never be seen again by a user. Hence when we set the inode size to
      zeor, also change the type to S_IFREG. xfs_ifree() expects S_IFREG
      inodes to be of zero length, and so this avoids all the problems of
      zero length symlinks ever hitting the disk. It also avoids the
      problem of needing to handle zero length symlink inodes in log
      recovery to replay the extent free intents and the remaining
      deferops to free the extents the symlink used.
      
      Also add a couple of asserts to warn us if zero length symlinks end
      up in either the symlink create or inactivation paths.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      22227437
  8. 01 12月, 2019 2 次提交
    • B
      xfs: clear ail delwri queued bufs on unmount of shutdown fs · f0f842a1
      Brian Foster 提交于
      [ Upstream commit efc3289cf8d39c34502a7cc9695ca2fa125aad0c ]
      
      In the typical unmount case, the AIL is forced out by the unmount
      sequence before the xfsaild task is stopped. Since AIL items are
      removed on writeback completion, this means that the AIL
      ->ail_buf_list delwri queue has been drained. This is not always
      true in the shutdown case, however.
      
      It's possible for buffers to sit on a delwri queue for a period of
      time across submission attempts if said items are locked or have
      been relogged and pinned since first added to the queue. If the
      attempt to log such an item results in a log I/O error, the error
      processing can shutdown the fs, remove the item from the AIL, stale
      the buffer (dropping the LRU reference) and clear its delwri queue
      state. The latter bit means the buffer will be released from a
      delwri queue on the next submission attempt, but this might never
      occur if the filesystem has shutdown and the AIL is empty.
      
      This means that such buffers are held indefinitely by the AIL delwri
      queue across destruction of the AIL. Aside from being a memory leak,
      these buffers can also hold references to in-core perag structures.
      The latter problem manifests as a generic/475 failure, reproducing
      the following asserts at unmount time:
      
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 151
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 132
      
      To prevent this problem, clear the AIL delwri queue as a final step
      before xfsaild() exit. The !empty state should never occur in the
      normal case, so add an assert to catch unexpected problems going
      forward.
      
      [dgc: add comment explaining need for xfs_buf_delwri_cancel() after
       calling xfs_buf_delwri_submit_nowait().]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f0f842a1
    • D
      xfs: fix use-after-free race in xfs_buf_rele · bb64349b
      Dave Chinner 提交于
      [ Upstream commit 37fd1678245f7a5898c1b05128bc481fb403c290 ]
      
      When looking at a 4.18 based KASAN use after free report, I noticed
      that racing xfs_buf_rele() may race on dropping the last reference
      to the buffer and taking the buffer lock. This was the symptom
      displayed by the KASAN report, but the actual issue that was
      reported had already been fixed in 4.19-rc1 by commit e339dd8d
      ("xfs: use sync buffer I/O for sync delwri queue submission").
      
      Despite this, I think there is still an issue with xfs_buf_rele()
      in this code:
      
              release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
              spin_lock(&bp->b_lock);
              if (!release) {
      .....
      
      If two threads race on the b_lock after both dropping a reference
      and one getting dropping the last reference so release = true, we
      end up with:
      
      CPU 0				CPU 1
      atomic_dec_and_lock()
      				atomic_dec_and_lock()
      				spin_lock(&bp->b_lock)
      spin_lock(&bp->b_lock)
      <spins>
      				<release = true bp->b_lru_ref = 0>
      				<remove from lists>
      				freebuf = true
      				spin_unlock(&bp->b_lock)
      				xfs_buf_free(bp)
      <gets lock, reading and writing freed memory>
      <accesses freed memory>
      spin_unlock(&bp->b_lock) <reads/writes freed memory>
      
      IOWs, we can't safely take bp->b_lock after dropping the hold
      reference because the buffer may go away at any time after we
      drop that reference. However, this can be fixed simply by taking the
      bp->b_lock before we drop the reference.
      
      It is safe to nest the pag_buf_lock inside bp->b_lock as the
      pag_buf_lock is only used to serialise against lookup in
      xfs_buf_find() and no other locks are held over or under the
      pag_buf_lock there. Make this clear by documenting the buffer lock
      orders at the top of the file.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bb64349b
  9. 01 10月, 2019 1 次提交
    • D
      xfs: don't crash on null attr fork xfs_bmapi_read · 649836fe
      Darrick J. Wong 提交于
      [ Upstream commit 8612de3f7ba6e900465e340516b8313806d27b2d ]
      
      Zorro Lang reported a crash in generic/475 if we try to inactivate a
      corrupt inode with a NULL attr fork (stack trace shortened somewhat):
      
      RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs]
      RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51
      RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012
      RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef
      R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004
      R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001
      FS:  00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0
      Call Trace:
       xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs]
       xfs_da_read_buf+0xf5/0x2c0 [xfs]
       xfs_da3_node_read+0x1d/0x230 [xfs]
       xfs_attr_inactive+0x3cc/0x5e0 [xfs]
       xfs_inactive+0x4c8/0x5b0 [xfs]
       xfs_fs_destroy_inode+0x31b/0x8e0 [xfs]
       destroy_inode+0xbc/0x190
       xfs_bulkstat_one_int+0xa8c/0x1200 [xfs]
       xfs_bulkstat_one+0x16/0x20 [xfs]
       xfs_bulkstat+0x6fa/0xf20 [xfs]
       xfs_ioc_bulkstat+0x182/0x2b0 [xfs]
       xfs_file_ioctl+0xee0/0x12a0 [xfs]
       do_vfs_ioctl+0x193/0x1000
       ksys_ioctl+0x60/0x90
       __x64_sys_ioctl+0x6f/0xb0
       do_syscall_64+0x9f/0x4d0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f11d39a3e5b
      
      The "obvious" cause is that the attr ifork is null despite the inode
      claiming an attr fork having at least one extent, but it's not so
      obvious why we ended up with an inode in that state.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      649836fe
  10. 29 8月, 2019 7 次提交
  11. 26 7月, 2019 4 次提交