1. 05 6月, 2018 1 次提交
  2. 16 5月, 2018 1 次提交
    • B
      xfs: factor out nodiscard helpers · 4e529339
      Brian Foster 提交于
      The changes to skip discards of speculative preallocation and
      unwritten extents introduced several new wrapper functions through
      the bunmapi -> extent free codepath to reduce churn in all of the
      associated callers. In several cases, these wrappers simply toggle a
      single flag to skip or not skip discards for the resulting blocks.
      
      The explicit _nodiscard() wrappers for such an isolated set of
      callers is a bit overkill. Kill off these wrappers and replace with
      the calls to the underlying functions in the contexts that need to
      control discard behavior. Retain the wrappers that preserve the
      original calling conventions to serve the original purpose of
      reducing code churn.
      
      This is a refactoring patch and does not change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4e529339
  3. 10 5月, 2018 7 次提交
  4. 10 4月, 2018 2 次提交
  5. 03 4月, 2018 1 次提交
  6. 30 3月, 2018 1 次提交
  7. 15 3月, 2018 1 次提交
  8. 12 3月, 2018 1 次提交
  9. 29 1月, 2018 6 次提交
  10. 13 1月, 2018 3 次提交
  11. 09 1月, 2018 1 次提交
  12. 22 12月, 2017 1 次提交
  13. 09 12月, 2017 1 次提交
    • C
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig 提交于
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2
  14. 28 11月, 2017 1 次提交
    • D
      xfs: always free inline data before resetting inode fork during ifree · 98c4f78d
      Darrick J. Wong 提交于
      In xfs_ifree, we reset the data/attr forks to extents format without
      bothering to free any inline data buffer that might still be around
      after all the blocks have been truncated off the file.  Prior to commit
      43518812 ("xfs: remove support for inlining data/extents into the
      inode fork") nobody noticed because the leftover inline data after
      truncation was small enough to fit inside the inline buffer inside the
      fork itself.
      
      However, now that we've removed the inline buffer, we /always/ have to
      free the inline data buffer or else we leak them like crazy.  This test
      was found by turning on kmemleak for generic/001 or generic/388.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      98c4f78d
  15. 17 11月, 2017 1 次提交
    • D
      xfs: fix forgotten rcu read unlock when skipping inode reclaim · 962cc1ad
      Darrick J. Wong 提交于
      In commit f2e9ad21 ("xfs: check for race with xfs_reclaim_inode"), we
      skip an inode if we're racing with freeing the inode via
      xfs_reclaim_inode, but we forgot to release the rcu read lock when
      dumping the inode, with the result that we exit to userspace with a lock
      held.  Don't do that; generic/320 with a 1k block size fails this
      very occasionally.
      
      ================================================
      WARNING: lock held when returning to user space!
      4.14.0-rc6-djwong #4 Tainted: G        W
      ------------------------------------------------
      rm/30466 is leaving the kernel with locks still held!
      1 lock held by rm/30466:
       #0:  (rcu_read_lock){....}, at: [<ffffffffa01364d3>] xfs_ifree_cluster.isra.17+0x2c3/0x6f0 [xfs]
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 30466 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x71/0x700
      Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
      CPU: 1 PID: 30466 Comm: rm Tainted: G        W       4.14.0-rc6-djwong #4
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
      task: ffff880037680000 task.stack: ffffc90001064000
      RIP: 0010:rcu_note_context_switch+0x71/0x700
      RSP: 0000:ffffc90001067e50 EFLAGS: 00010002
      RAX: 0000000000000001 RBX: ffff880037680000 RCX: ffff88003e73d200
      RDX: 0000000000000002 RSI: ffffffff819e53e9 RDI: ffffffff819f4375
      RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880062c900d0
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037680000
      R13: 0000000000000000 R14: ffffc90001067eb8 R15: ffff880037680690
      FS:  00007fa3b8ce8700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f69bf77c000 CR3: 000000002450a000 CR4: 00000000000006e0
      Call Trace:
       __schedule+0xb8/0xb10
       schedule+0x40/0x90
       exit_to_usermode_loop+0x6b/0xa0
       prepare_exit_to_usermode+0x7a/0x90
       retint_user+0x8/0x20
      RIP: 0033:0x7fa3b87fda87
      RSP: 002b:00007ffe41206568 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
      RAX: 0000000000000000 RBX: 00000000010e88c0 RCX: 00007fa3b87fda87
      RDX: 0000000000000000 RSI: 00000000010e89c8 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
      R10: 000000000000015e R11: 0000000000000246 R12: 00000000010c8060
      R13: 00007ffe41206690 R14: 0000000000000000 R15: 0000000000000000
      ---[ end trace e88f83bf0cfbd07d ]---
      
      Fixes: f2e9ad21
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      962cc1ad
  16. 07 11月, 2017 2 次提交
    • C
    • C
      xfs: use a b+tree for the in-core extent list · 6bdcf26a
      Christoph Hellwig 提交于
      Replace the current linear list and the indirection array for the in-core
      extent list with a b+tree to avoid the need for larger memory allocations
      for the indirection array when lots of extents are present.  The current
      extent list implementations leads to heavy pressure on the memory
      allocator when modifying files with a high extent count, and can lead
      to high latencies because of that.
      
      The replacement is a b+tree with a few quirks.  The leaf nodes directly
      store the extent record in two u64 values.  The encoding is a little bit
      different from the existing in-core extent records so that the start
      offset and length which are required for lookups can be retreived with
      simple mask operations.  The inner nodes store a 64-bit key containing
      the start offset in the first half of the node, and the pointers to the
      next lower level in the second half.  In either case we walk the node
      from the beginninig to the end and do a linear search, as that is more
      efficient for the low number of cache lines touched during a search
      (2 for the inner nodes, 4 for the leaf nodes) than a binary search.
      We store termination markers (zero length for the leaf nodes, an
      otherwise impossible high bit for the inner nodes) to terminate the key
      list / records instead of storing a count to use the available cache
      lines as efficiently as possible.
      
      One quirk of the algorithm is that while we normally split a node half and
      half like usual btree implementations we just spill over entries added at
      the very end of the list to a new node on its own.  This means we get a
      100% fill grade for the common cases of bulk insertion when reading an
      inode into memory, and when only sequentially appending to a file.  The
      downside is a slightly higher chance of splits on the first random
      insertions.
      
      Both insert and removal manually recurse into the lower levels, but
      the bulk deletion of the whole tree is still implemented as a recursive
      function call, although one limited by the overall depth and with very
      little stack usage in every iteration.
      
      For the first few extents we dynamically grow the list from a single
      extent to the next powers of two until we have a first full leaf block
      and that building the actual tree.
      
      The code started out based on the generic lib/btree.c code from Joern
      Engel based on earlier work from Peter Zijlstra, but has since been
      rewritten beyond recognition.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6bdcf26a
  17. 02 11月, 2017 1 次提交
  18. 27 10月, 2017 1 次提交
  19. 26 9月, 2017 1 次提交
  20. 02 9月, 2017 3 次提交
  21. 05 8月, 2017 1 次提交
  22. 28 6月, 2017 1 次提交
  23. 20 6月, 2017 1 次提交