1. 09 12月, 2017 3 次提交
    • D
      xfs: make iomap_begin functions trim iomaps consistently · b7e0b6ff
      Darrick J. Wong 提交于
      Historically, the XFS iomap_begin function only returned mappings for
      exactly the range queried, i.e. it doesn't do XFS_BMAPI_ENTIRE lookups.
      The current vfs iomap consumers are only set up to deal with trimmed
      mappings.  xfs_xattr_iomap_begin does BMAPI_ENTIRE lookups, which is
      inconsistent with the current iomap usage.  Remove the flag so that both
      iomap_begin functions behave the same way.
      
      FWIW this also fixes a behavioral regression in xattr FIEMAP that was
      introduced in 4.8 wherein attr fork extents are no longer trimmed like
      they used to be.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      b7e0b6ff
    • C
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig 提交于
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2
    • P
      fs: xfs: remove duplicate includes · eaf0ec30
      Pravin Shedge 提交于
      These duplicate includes have been found with scripts/checkincludes.pl but
      they have been removed manually to avoid removing false positives.
      Signed-off-by: NPravin Shedge <pravin.shedge4linux@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      eaf0ec30
  2. 01 12月, 2017 4 次提交
  3. 29 11月, 2017 4 次提交
  4. 28 11月, 2017 3 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
    • D
      xfs: log recovery should replay deferred ops in order · 50995582
      Darrick J. Wong 提交于
      As part of testing log recovery with dm_log_writes, Amir Goldstein
      discovered an error in the deferred ops recovery that lead to corruption
      of the filesystem metadata if a reflink+rmap filesystem happened to shut
      down midway through a CoW remap:
      
      "This is what happens [after failed log recovery]:
      
      "Phase 1 - find and verify superblock...
      "Phase 2 - using internal log
      "        - zero log...
      "        - scan filesystem freespace and inode maps...
      "        - found root inode chunk
      "Phase 3 - for each AG...
      "        - scan (but don't clear) agi unlinked lists...
      "        - process known inodes and perform inode discovery...
      "        - agno = 0
      "data fork in regular inode 134 claims CoW block 376
      "correcting nextents for inode 134
      "bad data fork in inode 134
      "would have cleared inode 134"
      
      Hou Tao dissected the log contents of exactly such a crash:
      
      "According to the implementation of xfs_defer_finish(), these ops should
      be completed in the following sequence:
      
      "Have been done:
      "(1) CUI: Oper (160)
      "(2) BUI: Oper (161)
      "(3) CUD: Oper (194), for CUI Oper (160)
      "(4) RUI A: Oper (197), free rmap [0x155, 2, -9]
      
      "Should be done:
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI A
      "(8) RUD: for RUI B
      
      "Actually be done by xlog_recover_process_intents()
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI B
      "(8) RUD: for RUI A
      
      "So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
      then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
      from the log record in post_mount.log (generated after umount) and the trace
      print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
      entry [0x155, 2, -9] are freed."
      
      When reconstructing the internal log state from the log items found on
      disk, it's required that deferred ops replay in exactly the same order
      that they would have had the filesystem not gone down.  However,
      replaying unfinished deferred ops can create /more/ deferred ops.  These
      new deferred ops are finished in the wrong order.  This causes fs
      corruption and replay crashes, so let's create a single defer_ops to
      handle the subsequent ops created during replay, then use one single
      transaction at the end of log recovery to ensure that everything is
      replayed in the same order as they're supposed to be.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Analyzed-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      50995582
    • D
      xfs: always free inline data before resetting inode fork during ifree · 98c4f78d
      Darrick J. Wong 提交于
      In xfs_ifree, we reset the data/attr forks to extents format without
      bothering to free any inline data buffer that might still be around
      after all the blocks have been truncated off the file.  Prior to commit
      43518812 ("xfs: remove support for inlining data/extents into the
      inode fork") nobody noticed because the leftover inline data after
      truncation was small enough to fit inside the inline buffer inside the
      fork itself.
      
      However, now that we've removed the inline buffer, we /always/ have to
      free the inline data buffer or else we leak them like crazy.  This test
      was found by turning on kmemleak for generic/001 or generic/388.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      98c4f78d
  5. 21 11月, 2017 2 次提交
  6. 17 11月, 2017 2 次提交
    • D
      xfs: fix type usage · 2015a63d
      Darrick J. Wong 提交于
      Be consistent about using uint32_t/uint8_t instead of u32/u8.  This is
      more so that we don't have to maintain /those/ types in xfsprogs.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      2015a63d
    • D
      xfs: fix forgotten rcu read unlock when skipping inode reclaim · 962cc1ad
      Darrick J. Wong 提交于
      In commit f2e9ad21 ("xfs: check for race with xfs_reclaim_inode"), we
      skip an inode if we're racing with freeing the inode via
      xfs_reclaim_inode, but we forgot to release the rcu read lock when
      dumping the inode, with the result that we exit to userspace with a lock
      held.  Don't do that; generic/320 with a 1k block size fails this
      very occasionally.
      
      ================================================
      WARNING: lock held when returning to user space!
      4.14.0-rc6-djwong #4 Tainted: G        W
      ------------------------------------------------
      rm/30466 is leaving the kernel with locks still held!
      1 lock held by rm/30466:
       #0:  (rcu_read_lock){....}, at: [<ffffffffa01364d3>] xfs_ifree_cluster.isra.17+0x2c3/0x6f0 [xfs]
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 30466 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x71/0x700
      Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
      CPU: 1 PID: 30466 Comm: rm Tainted: G        W       4.14.0-rc6-djwong #4
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
      task: ffff880037680000 task.stack: ffffc90001064000
      RIP: 0010:rcu_note_context_switch+0x71/0x700
      RSP: 0000:ffffc90001067e50 EFLAGS: 00010002
      RAX: 0000000000000001 RBX: ffff880037680000 RCX: ffff88003e73d200
      RDX: 0000000000000002 RSI: ffffffff819e53e9 RDI: ffffffff819f4375
      RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880062c900d0
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037680000
      R13: 0000000000000000 R14: ffffc90001067eb8 R15: ffff880037680690
      FS:  00007fa3b8ce8700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f69bf77c000 CR3: 000000002450a000 CR4: 00000000000006e0
      Call Trace:
       __schedule+0xb8/0xb10
       schedule+0x40/0x90
       exit_to_usermode_loop+0x6b/0xa0
       prepare_exit_to_usermode+0x7a/0x90
       retint_user+0x8/0x20
      RIP: 0033:0x7fa3b87fda87
      RSP: 002b:00007ffe41206568 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
      RAX: 0000000000000000 RBX: 00000000010e88c0 RCX: 00007fa3b87fda87
      RDX: 0000000000000000 RSI: 00000000010e89c8 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
      R10: 000000000000015e R11: 0000000000000246 R12: 00000000010c8060
      R13: 00007ffe41206690 R14: 0000000000000000 R15: 0000000000000000
      ---[ end trace e88f83bf0cfbd07d ]---
      
      Fixes: f2e9ad21
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      962cc1ad
  7. 16 11月, 2017 1 次提交
  8. 14 11月, 2017 1 次提交
  9. 10 11月, 2017 15 次提交
  10. 07 11月, 2017 5 次提交