1. 22 11月, 2018 1 次提交
    • D
      xfs: flush removing page cache in xfs_reflink_remap_prep · 2c307174
      Dave Chinner 提交于
      On a sub-page block size filesystem, fsx is failing with a data
      corruption after a series of operations involving copying a file
      with the destination offset beyond EOF of the destination of the file:
      
      8093(157 mod 256): TRUNCATE DOWN        from 0x7a120 to 0x50000 ******WWWW
      8094(158 mod 256): INSERT 0x25000 thru 0x25fff  (0x1000 bytes)
      8095(159 mod 256): COPY 0x18000 thru 0x1afff    (0x3000 bytes) to 0x2f400
      8096(160 mod 256): WRITE    0x5da00 thru 0x651ff        (0x7800 bytes) HOLE
      8097(161 mod 256): COPY 0x2000 thru 0x5fff      (0x4000 bytes) to 0x6fc00
      
      The second copy here is beyond EOF, and it is to sub-page (4k) but
      block aligned (1k) offset. The clone runs the EOF zeroing, landing
      in a pre-existing post-eof delalloc extent. This zeroes the post-eof
      extents in the page cache just fine, dirtying the pages correctly.
      
      The problem is that xfs_reflink_remap_prep() now truncates the page
      cache over the range that it is copying it to, and rounds that down
      to cover the entire start page. This removes the dirty page over the
      delalloc extent from the page cache without having written it back.
      Hence later, when the page cache is flushed, the page at offset
      0x6f000 has not been written back and hence exposes stale data,
      which fsx trips over less than 10 operations later.
      
      Fix this by changing xfs_reflink_remap_prep() to use
      xfs_flush_unmap_range().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2c307174
  2. 21 11月, 2018 4 次提交
    • D
      xfs: extent shifting doesn't fully invalidate page cache · 7f9f71be
      Dave Chinner 提交于
      The extent shifting code uses a flush and invalidate mechainsm prior
      to shifting extents around. This is similar to what
      xfs_free_file_space() does, but it doesn't take into account things
      like page cache vs block size differences, and it will fail if there
      is a page that it currently busy.
      
      xfs_flush_unmap_range() handles all of these cases, so just convert
      xfs_prepare_shift() to us that mechanism rather than having it's own
      special sauce.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7f9f71be
    • D
      xfs: finobt AG reserves don't consider last AG can be a runt · c0876897
      Dave Chinner 提交于
      The last AG may be very small comapred to all other AGs, and hence
      AG reservations based on the superblock AG size may actually consume
      more space than the AG actually has. This results on assert failures
      like:
      
      XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319
      [   48.932891]  xfs_ag_resv_init+0x1bd/0x1d0
      [   48.933853]  xfs_fs_reserve_ag_blocks+0x37/0xb0
      [   48.934939]  xfs_mountfs+0x5b3/0x920
      [   48.935804]  xfs_fs_fill_super+0x462/0x640
      [   48.936784]  ? xfs_test_remount_options+0x60/0x60
      [   48.937908]  mount_bdev+0x178/0x1b0
      [   48.938751]  mount_fs+0x36/0x170
      [   48.939533]  vfs_kern_mount.part.43+0x54/0x130
      [   48.940596]  do_mount+0x20e/0xcb0
      [   48.941396]  ? memdup_user+0x3e/0x70
      [   48.942249]  ksys_mount+0xba/0xd0
      [   48.943046]  __x64_sys_mount+0x21/0x30
      [   48.943953]  do_syscall_64+0x54/0x170
      [   48.944835]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Hence we need to ensure the finobt per-ag space reservations take
      into account the size of the last AG rather than treat it like all
      the other full size AGs.
      
      Note that both refcountbt and rmapbt already take the size of the AG
      into account via reading the AGF length directly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c0876897
    • D
      xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers · d43aaf16
      Dave Chinner 提交于
      When retrying a failed inode or dquot buffer,
      xfs_buf_resubmit_failed_buffers() clears all the failed flags from
      the inde/dquot log items. In doing so, it also drops all the
      reference counts on the buffer that the failed log items hold. This
      means it can drop all the active references on the buffer and hence
      free the buffer before it queues it for write again.
      
      Putting the buffer on the delwri queue takes a reference to the
      buffer (so that it hangs around until it has been written and
      completed), but this goes bang if the buffer has already been freed.
      
      Hence we need to add the buffer to the delwri queue before we remove
      the failed flags from the log items attached to the buffer to ensure
      it always remains referenced during the resubmit process.
      Reported-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d43aaf16
    • D
      xfs: uncached buffer tracing needs to print bno · d61fa8cb
      Dave Chinner 提交于
      Useless:
      
      xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_unlock:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_submit:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_hold:         dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iowait:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iodone:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iowait_done:  dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_rele:         dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      
      Useful:
      
      
      xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_unlock:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_submit:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_hold:         dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iowait:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iodone:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iowait_done:  dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_rele:         dev 253:32 bno 0x200b5 nblks 0x1 ...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d61fa8cb
  3. 20 11月, 2018 2 次提交
    • E
      xfs: make xfs_file_remap_range() static · da034bcc
      Eric Biggers 提交于
      xfs_file_remap_range() is only used in fs/xfs/xfs_file.c, so make it
      static.
      
      This addresses a gcc warning when -Wmissing-prototypes is enabled.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      da034bcc
    • B
      xfs: fix shared extent data corruption due to missing cow reservation · 59e42931
      Brian Foster 提交于
      Page writeback indirectly handles shared extents via the existence
      of overlapping COW fork blocks. If COW fork blocks exist, writeback
      always performs the associated copy-on-write regardless if the
      underlying blocks are actually shared. If the blocks are shared,
      then overlapping COW fork blocks must always exist.
      
      fstests shared/010 reproduces a case where a buffered write occurs
      over a shared block without performing the requisite COW fork
      reservation.  This ultimately causes writeback to the shared extent
      and data corruption that is detected across md5 checks of the
      filesystem across a mount cycle.
      
      The problem occurs when a buffered write lands over a shared extent
      that crosses an extent size hint boundary and that also happens to
      have a partial COW reservation that doesn't cover the start and end
      blocks of the data fork extent.
      
      For example, a buffered write occurs across the file offset (in FSB
      units) range of [29, 57]. A shared extent exists at blocks [29, 35]
      and COW reservation already exists at blocks [32, 34]. After
      accommodating a COW extent size hint of 32 blocks and the existing
      reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
      blocks of reservation at offset 0 and returns with COW reservation
      across the range of [0, 34]. The associated data fork extent is
      still [29, 35], however, which isn't fully covered by the COW
      reservation.
      
      This leads to a buffered write at file offset 35 over a shared
      extent without associated COW reservation. Writeback eventually
      kicks in, performs an overwrite of the underlying shared block and
      causes the associated data corruption.
      
      Update xfs_reflink_reserve_cow() to accommodate the fact that a
      delalloc allocation request may not fully cover the extent in the
      data fork. Trim the data fork extent appropriately, just as is done
      for shared extent boundaries and/or existing COW reservations that
      happen to overlap the start of the data fork extent. This prevents
      shared/010 failures due to data corruption on reflink enabled
      filesystems.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      59e42931
  4. 06 11月, 2018 3 次提交
    • D
      xfs: fix overflow in xfs_attr3_leaf_verify · 837514f7
      Dave Chinner 提交于
      generic/070 on 64k block size filesystems is failing with a verifier
      corruption on writeback or an attribute leaf block:
      
      [   94.973083] XFS (pmem0): Metadata corruption detected at xfs_attr3_leaf_verify+0x246/0x260, xfs_attr3_leaf block 0x811480
      [   94.975623] XFS (pmem0): Unmount and run xfs_repair
      [   94.976720] XFS (pmem0): First 128 bytes of corrupted metadata buffer:
      [   94.978270] 000000004b2e7b45: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
      [   94.980268] 000000006b1db90b: 00 00 00 00 00 81 14 80 00 00 00 00 00 00 00 00  ................
      [   94.982251] 00000000433f2407: 22 7b 5c 82 2d 5c 47 4c bb 31 1c 37 fa a9 ce d6  "{\.-\GL.1.7....
      [   94.984157] 0000000010dc7dfb: 00 00 00 00 00 81 04 8a 00 0a 18 e8 dd 94 01 00  ................
      [   94.986215] 00000000d5a19229: 00 a0 dc f4 fe 98 01 68 f0 d8 07 e0 00 00 00 00  .......h........
      [   94.988171] 00000000521df36c: 0c 2d 32 e2 fe 20 01 00 0c 2d 58 65 fe 0c 01 00  .-2.. ...-Xe....
      [   94.990162] 000000008477ae06: 0c 2d 5b 66 fe 8c 01 00 0c 2d 71 35 fe 7c 01 00  .-[f.....-q5.|..
      [   94.992139] 00000000a4a6bca6: 0c 2d 72 37 fc d4 01 00 0c 2d d8 b8 f0 90 01 00  .-r7.....-......
      [   94.994789] XFS (pmem0): xfs_do_force_shutdown(0x8) called from line 1453 of file fs/xfs/xfs_buf.c. Return address = ffffffff815365f3
      
      This is failing this check:
      
                      end = ichdr.freemap[i].base + ichdr.freemap[i].size;
                      if (end < ichdr.freemap[i].base)
      >>>>>                   return __this_address;
                      if (end > mp->m_attr_geo->blksize)
                              return __this_address;
      
      And from the buffer output above, the freemap array is:
      
      	freemap[0].base = 0x00a0
      	freemap[0].size = 0xdcf4	end = 0xdd94
      	freemap[1].base = 0xfe98
      	freemap[1].size = 0x0168	end = 0x10000
      	freemap[2].base = 0xf0d8
      	freemap[2].size = 0x07e0	end = 0xf8b8
      
      These all look valid - the block size is 0x10000 and so from the
      last check in the above verifier fragment we know that the end
      of freemap[1] is valid. The problem is that end is declared as:
      
      	uint16_t	end;
      
      And (uint16_t)0x10000 = 0. So we have a verifier bug here, not a
      corruption. Fix the verifier to use uint32_t types for the check and
      hence avoid the overflow.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=201577Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      837514f7
    • D
      xfs: print buffer offsets when dumping corrupt buffers · bdec055b
      Darrick J. Wong 提交于
      Use DUMP_PREFIX_OFFSET when printing hex dumps of corrupt buffers
      because modern Linux now prints a 32-bit hash of our 64-bit pointer when
      using DUMP_PREFIX_ADDRESS:
      
      00000000b4bb4297: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
      00000005ec77e26: 00 00 00 00 02 d0 5a 00 00 00 00 00 00 00 00 00  ......Z.........
      000000015938018: 21 98 e8 b4 fd de 4c 07 bc ea 3c e5 ae b4 7c 48  !.....L...<...|H
      
      This is totally worthless for a sequential dump since we probably only
      care about tracking the buffer offsets and afaik there's no way to
      recover the actual pointer from the hashed value.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      bdec055b
    • C
      xfs: Fix error code in 'xfs_ioc_getbmap()' · 132bf672
      Christophe JAILLET 提交于
      In this function, once 'buf' has been allocated, we unconditionally
      return 0.
      However, 'error' is set to some error codes in several error handling
      paths.
      Before commit 232b5194 ("xfs: simplify the xfs_getbmap interface")
      this was not an issue because all error paths were returning directly,
      but now that some cleanup at the end may be needed, we must propagate the
      error code.
      
      Fixes: 232b5194 ("xfs: simplify the xfs_getbmap interface")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      132bf672
  5. 30 10月, 2018 13 次提交
  6. 18 10月, 2018 17 次提交
    • C
      xfs: cancel COW blocks before swapext · 96987eea
      Christoph Hellwig 提交于
      We need to make sure we have no outstanding COW blocks before we swap
      extents, as there is nothing preventing us from having preallocated COW
      delalloc on either inode that swapext is called on.  That case can
      easily be reproduced by running generic/324 in always_cow mode:
      
      [  620.760572] XFS: Assertion failed: tip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap_util.c, line: 1669
      [  620.761608] ------------[ cut here ]------------
      [  620.762171] kernel BUG at fs/xfs/xfs_message.c:102!
      [  620.762732] invalid opcode: 0000 [#1] SMP PTI
      [  620.763272] CPU: 0 PID: 24153 Comm: xfs_fsr Tainted: G        W         4.19.0-rc1+ #4182
      [  620.764203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
      [  620.765202] RIP: 0010:assfail+0x20/0x28
      [  620.765646] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
      [  620.767758] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
      [  620.768359] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
      [  620.769174] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
      [  620.769982] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
      [  620.770788] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
      [  620.771638] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
      [  620.772504] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
      [  620.773475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  620.774168] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
      [  620.774978] Call Trace:
      [  620.775274]  xfs_swap_extent_forks+0x2a0/0x2e0
      [  620.775792]  xfs_swap_extents+0x38b/0xab0
      [  620.776256]  xfs_ioc_swapext+0x121/0x140
      [  620.776709]  xfs_file_ioctl+0x328/0xc90
      [  620.777154]  ? rcu_read_lock_sched_held+0x50/0x60
      [  620.777694]  ? xfs_iunlock+0x233/0x260
      [  620.778127]  ? xfs_setattr_nonsize+0x3be/0x6a0
      [  620.778647]  do_vfs_ioctl+0x9d/0x680
      [  620.779071]  ? ksys_fchown+0x47/0x80
      [  620.779552]  ksys_ioctl+0x35/0x70
      [  620.780040]  __x64_sys_ioctl+0x11/0x20
      [  620.780530]  do_syscall_64+0x4b/0x190
      [  620.780927]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  620.781467] RIP: 0033:0x7fdc364d0f07
      [  620.781900] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 28
      [  620.784044] RSP: 002b:00007ffe2a766038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  620.784896] RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fdc364d0f07
      [  620.785667] RDX: 0000560296ca2fc0 RSI: 00000000c0c0586d RDI: 0000000000000005
      [  620.786398] RBP: 0000000000000025 R08: 0000000000001200 R09: 0000000000000000
      [  620.787283] R10: 0000000000000432 R11: 0000000000000246 R12: 0000000000000005
      [  620.788051] R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000006
      [  620.788927] Modules linked in:
      [  620.789340] ---[ end trace 9503b7417ffdbdb0 ]---
      [  620.790065] RIP: 0010:assfail+0x20/0x28
      [  620.790642] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
      [  620.793038] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
      [  620.793609] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
      [  620.794317] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
      [  620.795025] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
      [  620.795778] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
      [  620.796675] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
      [  620.797782] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
      [  620.798908] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  620.799594] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
      [  620.800424] Kernel panic - not syncing: Fatal exception
      [  620.801191] Kernel Offset: disabled
      [  620.801597] ---[ end Kernel panic - not syncing: Fatal exception ]---
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      96987eea
    • B
      xfs: clear ail delwri queued bufs on unmount of shutdown fs · efc3289c
      Brian Foster 提交于
      In the typical unmount case, the AIL is forced out by the unmount
      sequence before the xfsaild task is stopped. Since AIL items are
      removed on writeback completion, this means that the AIL
      ->ail_buf_list delwri queue has been drained. This is not always
      true in the shutdown case, however.
      
      It's possible for buffers to sit on a delwri queue for a period of
      time across submission attempts if said items are locked or have
      been relogged and pinned since first added to the queue. If the
      attempt to log such an item results in a log I/O error, the error
      processing can shutdown the fs, remove the item from the AIL, stale
      the buffer (dropping the LRU reference) and clear its delwri queue
      state. The latter bit means the buffer will be released from a
      delwri queue on the next submission attempt, but this might never
      occur if the filesystem has shutdown and the AIL is empty.
      
      This means that such buffers are held indefinitely by the AIL delwri
      queue across destruction of the AIL. Aside from being a memory leak,
      these buffers can also hold references to in-core perag structures.
      The latter problem manifests as a generic/475 failure, reproducing
      the following asserts at unmount time:
      
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 151
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 132
      
      To prevent this problem, clear the AIL delwri queue as a final step
      before xfsaild() exit. The !empty state should never occur in the
      normal case, so add an assert to catch unexpected problems going
      forward.
      
      [dgc: add comment explaining need for xfs_buf_delwri_cancel() after
       calling xfs_buf_delwri_submit_nowait().]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      efc3289c
    • C
      xfs: use offsetof() in place of offset macros for __xfsstats · 26ca3901
      Carlos Maiolino 提交于
      Most offset macro mess is used in xfs_stats_format() only, and we can
      simply get the right offsets using offsetof(), instead of several macros
      to mark the offsets inside __xfsstats structure.
      
      Replace all XFSSTAT_END_* macros by a single helper macro to get the
      right offset into __xfsstats, and use this helper in xfs_stats_format()
      directly.
      
      The quota stats code, still looks a bit cleaner when using XFSSTAT_*
      macros, so, this patch also defines XFSSTAT_START_XQMSTAT and
      XFSSTAT_END_XQMSTAT locally to that code. This also should prevent
      offset mistakes when updates are done into __xfsstats.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      26ca3901
    • C
      xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat · 41657e55
      Carlos Maiolino 提交于
      The addition of FIBT, RMAP and REFCOUNT changed the offsets into
      __xfssats structure.
      
      This caused xqmstat_proc_show() to display garbage data via
      /proc/fs/xfs/xqmstat, once it relies on the offsets marked via macros.
      
      Fix it.
      
      Fixes: 00f4e4f9 xfs: add rmap btree stats infrastructure
      Fixes: aafc3c24 xfs: support the XFS_BTNUM_FINOBT free inode btree type
      Fixes: 46eeb521 xfs: introduce refcount btree definitions
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      41657e55
    • D
      xfs: fix use-after-free race in xfs_buf_rele · 37fd1678
      Dave Chinner 提交于
      When looking at a 4.18 based KASAN use after free report, I noticed
      that racing xfs_buf_rele() may race on dropping the last reference
      to the buffer and taking the buffer lock. This was the symptom
      displayed by the KASAN report, but the actual issue that was
      reported had already been fixed in 4.19-rc1 by commit e339dd8d
      ("xfs: use sync buffer I/O for sync delwri queue submission").
      
      Despite this, I think there is still an issue with xfs_buf_rele()
      in this code:
      
              release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
              spin_lock(&bp->b_lock);
              if (!release) {
      .....
      
      If two threads race on the b_lock after both dropping a reference
      and one getting dropping the last reference so release = true, we
      end up with:
      
      CPU 0				CPU 1
      atomic_dec_and_lock()
      				atomic_dec_and_lock()
      				spin_lock(&bp->b_lock)
      spin_lock(&bp->b_lock)
      <spins>
      				<release = true bp->b_lru_ref = 0>
      				<remove from lists>
      				freebuf = true
      				spin_unlock(&bp->b_lock)
      				xfs_buf_free(bp)
      <gets lock, reading and writing freed memory>
      <accesses freed memory>
      spin_unlock(&bp->b_lock) <reads/writes freed memory>
      
      IOWs, we can't safely take bp->b_lock after dropping the hold
      reference because the buffer may go away at any time after we
      drop that reference. However, this can be fixed simply by taking the
      bp->b_lock before we drop the reference.
      
      It is safe to nest the pag_buf_lock inside bp->b_lock as the
      pag_buf_lock is only used to serialise against lookup in
      xfs_buf_find() and no other locks are held over or under the
      pag_buf_lock there. Make this clear by documenting the buffer lock
      orders at the top of the file.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      37fd1678
    • A
      xfs: Add attibute remove and helper functions · 068f985a
      Allison Henderson 提交于
      This patch adds xfs_attr_remove_args. These sub-routines remove
      the attributes specified in @args. We will use this later for setting
      parent pointers as a deferred attribute operation.
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      068f985a
    • A
      xfs: Add attibute set and helper functions · 2f3cd809
      Allison Henderson 提交于
      This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff.
      These sub-routines set the attributes specified in @args.
      We will use this later for setting parent pointers as a deferred
      attribute operation.
      
      [dgc: remove attr fork init code from xfs_attr_set_args().]
      [dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.]
      [dgc: correct sf add error handling.]
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2f3cd809
    • A
      xfs: Add helper function xfs_attr_try_sf_addname · 4c74a56b
      Allison Henderson 提交于
      This patch adds a subroutine xfs_attr_try_sf_addname
      used by xfs_attr_set.  This subrotine will attempt to
      add the attribute name specified in args in shortform,
      as well and perform error handling previously done in
      xfs_attr_set.
      
      This patch helps to pre-simplify xfs_attr_set for reviewing
      purposes and reduce indentation.  New function will be added
      in the next patch.
      
      [dgc: moved commit to helper function, too.]
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4c74a56b
    • A
      xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h · e2421f0b
      Allison Henderson 提交于
      This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
      since xfs_attr.c is in libxfs.  We will need these later in
      xfsprogs.
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e2421f0b
    • D
      xfs: issue log message on user force shutdown · 56668a5c
      Dave Chinner 提交于
      The kernel only issues a log message that it's been shut down when
      the filesystem triggers a shutdown itself. Hence there is no trace
      in the log when a shutdown is triggered manually from userspace.
      This can make it hard to see sequence of events in the log when
      things go wrong, so make sure we always log a message when a
      shutdown is run.
      
      While there, clean up the logic flow so we don't have to continually
      check if the shutdown trigger was user initiated before logging
      shutdown messages.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      56668a5c
    • D
      xfs: fix buffer state management in xrep_findroot_block · 38b6238e
      Darrick J. Wong 提交于
      We don't handle buffer state properly in online repair's findroot
      routine.  If a buffer already has b_ops set, we don't ever want to touch
      that, and we don't want to call the read verifiers on a buffer that
      could be dirty (CRCs are only recomputed during log checkpoints).
      
      Therefore, be more careful about what we do with a buffer -- if someone
      else already attached ops that are not the ones for this btree type,
      just ignore the buffer.  We only attach our btree type's buf ops if it
      matches the magic/uuid and structure checks.
      
      We also modify xfs_buf_read_map to allow callers to set buffer ops on a
      DONE buffer with NULL ops so that repair doesn't leave behind buffers
      which won't have buffers attached to them.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      38b6238e
    • D
      xfs: always assign buffer verifiers when one is provided · 1aff5696
      Darrick J. Wong 提交于
      If a caller supplies buffer ops when trying to read a buffer and the
      buffer doesn't already have buf ops assigned, ensure that the ops are
      assigned to the buffer and the verifier is run on that buffer.
      
      Note that current XFS code is careful to assign buffer ops after a
      xfs_{trans_,}buf_read call in which ops were not supplied.  However, we
      should apply ops defensively in case there is ever a coding mistake; and
      an upcoming repair patch will need to be able to read a buffer without
      assigning buf ops.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      1aff5696
    • D
      xfs: xrep_findroot_block should reject root blocks with siblings · 1002ff45
      Darrick J. Wong 提交于
      In xrep_findroot_block, if we find a candidate root block with sibling
      pointers or sibling blocks on the same tree level, we should not return
      that block as a tree root because root blocks cannot have siblings.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      1002ff45
    • A
      xfs: add a define for statfs magic to uapi · dddde68b
      Adam Borowski 提交于
      Needed by userspace programs that call fstatfs().
      
      It'd be natural to publish XFS_SB_MAGIC in uapi, but while these two
      have identical values, they have different semantic meaning: one is
      an enum cookie meant for statfs, the other a signature of the
      on-disk format.
      Signed-off-by: NAdam Borowski <kilobyte@angband.pl>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dddde68b
    • C
      xfs: print dangling delalloc extents · 4831822f
      Christoph Hellwig 提交于
      Instead of just asserting that we have no delalloc space dangling
      in an inode that gets freed print the actual offenders for debug
      mode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4831822f
    • C
      xfs: fix fork selection in xfs_find_trim_cow_extent · 032dc923
      Christoph Hellwig 提交于
      We should want to write directly into the data fork for blocks that don't
      have an extent in the COW fork covering them yet.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      032dc923
    • C