1. 16 9月, 2020 2 次提交
  2. 27 8月, 2020 1 次提交
    • B
      xfs: fix off-by-one in inode alloc block reservation calculation · 657f1019
      Brian Foster 提交于
      The inode chunk allocation transaction reserves inobt_maxlevels-1
      blocks to accommodate a full split of the inode btree. A full split
      requires an allocation for every existing level and a new root
      block, which means inobt_maxlevels is the worst case block
      requirement for a transaction that inserts to the inobt. This can
      lead to a transaction block reservation overrun when tmpfile
      creation allocates an inode chunk and expands the inobt to its
      maximum depth. This problem has been observed in conjunction with
      overlayfs, which makes frequent use of tmpfiles internally.
      
      The existing reservation code goes back as far as the Linux git repo
      history (v2.6.12). It was likely never observed as a problem because
      the traditional file/directory creation transactions also include
      worst case block reservation for directory modifications, which most
      likely is able to make up for a single block deficiency in the inode
      allocation portion of the calculation. tmpfile support is relatively
      more recent (v3.15), less heavily used, and only includes the inode
      allocation block reservation as tmpfiles aren't linked into the
      directory tree on creation.
      
      Fix up the inode alloc block reservation macro and a couple of the
      block allocator minleft parameters that enforce an allocation to
      leave enough free blocks in the AG for a full inobt split.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      657f1019
  3. 14 7月, 2020 1 次提交
    • G
      xfs: get rid of unnecessary xfs_perag_{get,put} pairs · 92a00544
      Gao Xiang 提交于
      In the course of some operations, we look up the perag from
      the mount multiple times to get or change perag information.
      These are often very short pieces of code, so while the
      lookup cost is generally low, the cost of the lookup is far
      higher than the cost of the operation we are doing on the
      perag.
      
      Since we changed buffers to hold references to the perag
      they are cached in, many modification contexts already hold
      active references to the perag that are held across these
      operations. This is especially true for any operation that
      is serialised by an allocation group header buffer.
      
      In these cases, we can just use the buffer's reference to
      the perag to avoid needing to do lookups to access the
      perag. This means that many operations don't need to do
      perag lookups at all to access the perag because they've
      already looked up objects that own persistent references
      and hence can use that reference instead.
      
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      92a00544
  4. 19 3月, 2020 2 次提交
  5. 14 3月, 2020 1 次提交
  6. 12 3月, 2020 1 次提交
  7. 27 1月, 2020 1 次提交
  8. 19 12月, 2019 1 次提交
    • D
      xfs: don't commit sunit/swidth updates to disk if that would cause repair failures · 13eaec4b
      Darrick J. Wong 提交于
      Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
      and swidth values could cause xfs_repair to fail loudly.  The problem
      here is that repair calculates the where mkfs should have allocated the
      root inode, based on the superblock geometry.  The allocation decisions
      depend on sunit, which means that we really can't go updating sunit if
      it would lead to a subsequent repair failure on an otherwise correct
      filesystem.
      
      Port from xfs_repair some code that computes the location of the root
      inode and teach mount to skip the ondisk update if it would cause
      problems for repair.  Along the way we'll update the documentation,
      provide a function for computing the minimum AGFL size instead of
      open-coding it, and cut down some indenting in the mount code.
      
      Note that we allow the mount to proceed (and new allocations will
      reflect this new geometry) because we've never screened this kind of
      thing before.  We'll have to wait for a new future incompat feature to
      enforce correct behavior, alas.
      
      Note that the geometry reporting always uses the superblock values, not
      the incore ones, so that is what xfs_info and xfs_growfs will report.
      
      [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7dReported-by: NAlex Lyakas <alex@zadara.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      13eaec4b
  9. 13 11月, 2019 1 次提交
    • D
      xfs: kill the XFS_WANT_CORRUPT_* macros · f9e03706
      Darrick J. Wong 提交于
      The XFS_WANT_CORRUPT_* macros conceal subtle side effects such as the
      creation of local variables and redirections of the code flow.  This is
      pretty ugly, so replace them with explicit XFS_IS_CORRUPT tests that
      remove both of those ugly points.  The change was performed with the
      following coccinelle script:
      
      @@
      expression mp, test;
      identifier label;
      @@
      
      - XFS_WANT_CORRUPTED_GOTO(mp, test, label);
      + if (XFS_IS_CORRUPT(mp, !test)) { error = -EFSCORRUPTED; goto label; }
      
      @@
      expression mp, test;
      @@
      
      - XFS_WANT_CORRUPTED_RETURN(mp, test);
      + if (XFS_IS_CORRUPT(mp, !test)) return -EFSCORRUPTED;
      
      @@
      expression mp, lval, rval;
      @@
      
      - XFS_IS_CORRUPT(mp, !(lval == rval))
      + XFS_IS_CORRUPT(mp, lval != rval)
      
      @@
      expression mp, e1, e2;
      @@
      
      - XFS_IS_CORRUPT(mp, !(e1 && e2))
      + XFS_IS_CORRUPT(mp, !e1 || !e2)
      
      @@
      expression e1, e2;
      @@
      
      - !(e1 == e2)
      + e1 != e2
      
      @@
      expression e1, e2, e3, e4, e5, e6;
      @@
      
      - !(e1 == e2 && e3 == e4) || e5 != e6
      + e1 != e2 || e3 != e4 || e5 != e6
      
      @@
      expression e1, e2, e3, e4, e5, e6;
      @@
      
      - !(e1 == e2 || (e3 <= e4 && e5 <= e6))
      + e1 != e2 && (e3 > e4 || e5 > e6)
      
      @@
      expression mp, e1, e2;
      @@
      
      - XFS_IS_CORRUPT(mp, !(e1 <= e2))
      + XFS_IS_CORRUPT(mp, e1 > e2)
      
      @@
      expression mp, e1, e2;
      @@
      
      - XFS_IS_CORRUPT(mp, !(e1 < e2))
      + XFS_IS_CORRUPT(mp, e1 >= e2)
      
      @@
      expression mp, e1;
      @@
      
      - XFS_IS_CORRUPT(mp, !!e1)
      + XFS_IS_CORRUPT(mp, e1)
      
      @@
      expression mp, e1, e2;
      @@
      
      - XFS_IS_CORRUPT(mp, !(e1 || e2))
      + XFS_IS_CORRUPT(mp, !e1 && !e2)
      
      @@
      expression mp, e1, e2, e3, e4;
      @@
      
      - XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 == e4))
      + XFS_IS_CORRUPT(mp, e1 != e2 && e3 != e4)
      
      @@
      expression mp, e1, e2, e3, e4;
      @@
      
      - XFS_IS_CORRUPT(mp, !(e1 <= e2) || !(e3 >= e4))
      + XFS_IS_CORRUPT(mp, e1 > e2 || e3 < e4)
      
      @@
      expression mp, e1, e2, e3, e4;
      @@
      
      - XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 <= e4))
      + XFS_IS_CORRUPT(mp, e1 != e2 && e3 > e4)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f9e03706
  10. 28 8月, 2019 1 次提交
    • D
      xfs: fix maxicount division by zero error · c94613fe
      Darrick J. Wong 提交于
      In xfs_ialloc_setup_geometry, it's possible for a malicious/corrupt fs
      image to set an unreasonably large value for sb_inopblog which will
      cause ialloc_blks to be zero.  If sb_imax_pct is also set, this results
      in a division by zero error in the second do_div call.  Therefore, force
      maxicount to zero if ialloc_blks is zero.
      
      Note that the kernel metadata verifiers will catch the garbage inopblog
      value and abort the fs mount long before it tries to set up the inode
      geometry; this is needed to avoid a crash in xfs_db while setting up the
      xfs_mount structure.
      
      Found by fuzzing sb_inopblog to 122 in xfs/350.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      c94613fe
  11. 29 6月, 2019 2 次提交
  12. 12 6月, 2019 3 次提交
  13. 12 2月, 2019 1 次提交
  14. 13 12月, 2018 4 次提交
  15. 03 8月, 2018 1 次提交
    • B
      xfs: pass transaction to xfs_defer_add() · 0f37d178
      Brian Foster 提交于
      The majority of remaining references to struct xfs_defer_ops in XFS
      are associated with xfs_defer_add(). At this point, there are no
      more external xfs_defer_ops users left. All instances of
      xfs_defer_ops are embedded in the transaction, which means we can
      safely pass the transaction down to the dfops add interface.
      
      Update xfs_defer_add() to receive the transaction as a parameter.
      Various subsystems implement wrappers to allocate and construct the
      context specific data structures for the associated deferred
      operation type. Update these to also carry the transaction down as
      needed and clean up unused dfops parameters along the way.
      
      This removes most of the remaining references to struct
      xfs_defer_ops throughout the code and facilitates removal of the
      structure.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [darrick: fix unused variable warnings with ftrace disabled]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0f37d178
  16. 24 7月, 2018 1 次提交
  17. 18 7月, 2018 1 次提交
  18. 12 7月, 2018 1 次提交
  19. 12 6月, 2018 1 次提交
  20. 09 6月, 2018 1 次提交
  21. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  22. 06 6月, 2018 1 次提交
    • D
      xfs: validate btree records on retrieval · 9e6c08d4
      Dave Chinner 提交于
      So we don't check the validity of records as we walk the btree. When
      there are corrupt records in the free space btree (e.g. zero
      startblock/length or beyond EOAG) we just blindly use it and things
      go bad from there. That leads to assert failures on debug kernels
      like this:
      
      XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 450
      ....
      Call Trace:
       xfs_alloc_fixup_trees+0x368/0x5c0
       xfs_alloc_ag_vextent_near+0x79a/0xe20
       xfs_alloc_ag_vextent+0x1d3/0x330
       xfs_alloc_vextent+0x5e9/0x870
      
      Or crashes like this:
      
      XFS (loop0): xfs_buf_find: daddr 0x7fb28 out of range, EOFS 0x8000
      .....
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
      ....
      Call Trace:
       xfs_bmap_add_extent_hole_real+0x67d/0x930
       xfs_bmapi_write+0x934/0xc90
       xfs_da_grow_inode_int+0x27e/0x2f0
       xfs_dir2_grow_inode+0x55/0x130
       xfs_dir2_sf_to_block+0x94/0x5d0
       xfs_dir2_sf_addname+0xd0/0x590
       xfs_dir_createname+0x168/0x1a0
       xfs_rename+0x658/0x9b0
      
      By checking that free space records pulled from the trees are
      within the valid range, we catch many of these corruptions before
      they can do damage.
      
      This is a generic btree record checking deficiency. We need to
      validate the records we fetch from all the different btrees before
      we use them to catch corruptions like this.
      
      This patch results in a corrupt record emitting an error message and
      returning -EFSCORRUPTED, and the higher layers catch that and abort:
      
       XFS (loop0): Size Freespace BTree record corruption in AG 0 detected!
       XFS (loop0): start block 0x0 block count 0x0
       XFS (loop0): Internal error xfs_trans_cancel at line 1012 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x42a/0x670
       .....
       Call Trace:
        dump_stack+0x85/0xcb
        xfs_trans_cancel+0x19f/0x1c0
        xfs_create+0x42a/0x670
        xfs_generic_create+0x1f6/0x2c0
        vfs_create+0xf9/0x180
        do_mknodat+0x1f9/0x210
        do_syscall_64+0x5a/0x180
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      .....
       XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1013 of file fs/xfs/xfs_trans.c.  Return address = ffffffff81500868
       XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9e6c08d4
  23. 04 6月, 2018 1 次提交
  24. 16 5月, 2018 1 次提交
  25. 10 4月, 2018 1 次提交
  26. 29 1月, 2018 1 次提交
    • C
      Split buffer's b_fspriv field · fb1755a6
      Carlos Maiolino 提交于
      By splitting the b_fspriv field into two different fields (b_log_item
      and b_li_list). It's possible to get rid of an old ABI workaround, by
      using the new b_log_item field to store xfs_buf_log_item separated from
      the log items attached to the buffer, which will be linked in the new
      b_li_list field.
      
      This way, there is no more need to reorder the log items list to place
      the buf_log_item at the beginning of the list, simplifying a bit the
      logic to handle buffer IO.
      
      This also opens the possibility to change buffer's log items list into a
      proper list_head.
      
      b_log_item field is still defined as a void *, because it is still used
      by the log buffers to store xlog_in_core structures, and there is no
      need to add an extra field on xfs_buf just for xlog_in_core.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: minor style changes]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      fb1755a6
  27. 18 1月, 2018 1 次提交
  28. 09 1月, 2018 4 次提交
  29. 09 12月, 2017 1 次提交
    • C
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig 提交于
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2