1. 25 2月, 2017 1 次提交
  2. 23 2月, 2017 27 次提交
  3. 22 2月, 2017 1 次提交
    • E
      proc/sysctl: Don't grab i_lock under sysctl_lock. · ace0c791
      Eric W. Biederman 提交于
      Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
      > This patch has locking problem. I've got lockdep splat under LTP.
      >
      > [ 6633.115456] ======================================================
      > [ 6633.115502] [ INFO: possible circular locking dependency detected ]
      > [ 6633.115553] 4.9.10-debug+ #9 Tainted: G             L
      > [ 6633.115584] -------------------------------------------------------
      > [ 6633.115627] ksm02/284980 is trying to acquire lock:
      > [ 6633.115659]  (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
      > [ 6633.115834] but task is already holding lock:
      > [ 6633.115882]  (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
      > [ 6633.116026] which lock already depends on the new lock.
      > [ 6633.116026]
      > [ 6633.116080]
      > [ 6633.116080] the existing dependency chain (in reverse order) is:
      > [ 6633.116117]
      > -> #2 (sysctl_lock){+.+...}:
      > -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
      > -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
      >
      > d_lock nests inside i_lock
      > sysctl_lock nests inside d_lock in d_compare
      >
      > This patch adds i_lock nesting inside sysctl_lock.
      
      Al Viro <viro@ZenIV.linux.org.uk> replied:
      > Once ->unregistering is set, you can drop sysctl_lock just fine.  So I'd
      > try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
      > drop sysctl_lock() before it and regain after.  Make sure that no inodes
      > are added to the list ones ->unregistering has been set and use RCU list
      > primitives for modifying the inode list, with sysctl_lock still used to
      > serialize its modifications.
      >
      > Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
      > igrab() is safe there.  Since we don't drop inode reference until after we'd
      > passed beyond it in the list, list_for_each_entry_rcu() should be fine.
      
      I agree with Al Viro's analsysis of the situtation.
      
      Fixes: d6cffbbe ("proc/sysctl: prune stale dentries during unregistering")
      Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Tested-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ace0c791
  4. 21 2月, 2017 1 次提交
  5. 18 2月, 2017 3 次提交
  6. 17 2月, 2017 7 次提交
    • C
      xfs: tune down agno asserts in the bmap code · 410d17f6
      Christoph Hellwig 提交于
      In various places we currently assert that xfs_bmap_btalloc allocates
      from the same as the firstblock value passed in, unless it's either
      NULLAGNO or the dop_low flag is set.  But the reflink code does not
      fully follow this convention as it passes in firstblock purely as
      a hint for the allocator without actually having previous allocations
      in the transaction, and without having a minleft check on the current
      AG, leading to the assert firing on a very full and heavily used
      file system.  As even the reflink code only allocates from equal or
      higher AGs for now we can simply the check to always allow for equal
      or higher AGs.
      
      Note that we need to eventually split the two meanings of the firstblock
      value.  At that point we can also allow the reflink code to allocate
      from any AG instead of limiting it in any way.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      410d17f6
    • C
      xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment · 8ee9fdbe
      Chandan Rajendra 提交于
      On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace,
      
      XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026
      
      kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113!
      Oops: Exception in kernel mode, sig: 5 [#1]
      SMP NR_CPUS=2048
      DEBUG_PAGEALLOC
      NUMA
      pSeries
      Modules linked in:
      CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58
      task: c000000102606d80 task.stack: c0000001026b8000
      NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290
      REGS: c0000001026bb090 TRAP: 0700   Not tainted  (4.10.0-rc5)
      MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>
      CR: 28004428  XER: 00000000
      CFAR: c0000000004ef180 SOFTE: 1
      GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea
      GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0
      GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990
      GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000
      GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0
      GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001
      GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000
      GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0
      NIP [c0000000004ef798] .assfail+0x28/0x30
      LR [c0000000004ef798] .assfail+0x28/0x30
      Call Trace:
      [c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable)
      [c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0
      [c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480
      [c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90
      [c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820
      [c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320
      [c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610
      [c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0
      [c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790
      [c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410
      [c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230
      [c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120
      [c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc
      Instruction dump:
      4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78
      38600000 f8010010 f821ff91 4bfff94d <0fe00000> 60000000 7c0802a6 7c892378
      
      When block size is larger than inode cluster size, the call to
      XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
      would have set xfs_sb->sb_inoalignmt to 0. This causes
      xfs_ialloc_cluster_alignment() to return 0.  Due to this
      args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned
      equivalent of -1 assigned to it. This later causes alloc_len in
      xfs_alloc_space_available() to have a value of 0. In such a scenario
      when args.total is also 0, the assert statement "ASSERT(args->maxlen >
      0);" fails.
      
      This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in
      xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb().
      Suggested-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      8ee9fdbe
    • B
      xfs: don't reserve blocks for right shift transactions · 48af96ab
      Brian Foster 提交于
      The block reservation for the transaction allocated in
      xfs_shift_file_space() is an artifact of the original collapse range
      support. It exists to handle the case where a collapse range occurs,
      the initial extent is left shifted into a location that forms a
      contiguous boundary with the previous extent and thus the extents
      are merged. This code was subsequently refactored and reused for
      insert range (right shift) support.
      
      If an insert range occurs under low free space conditions, the
      extent at the starting offset is split before the first shift
      transaction is allocated. If the block reservation fails, this
      leaves separate, but contiguous extents around in the inode. While
      not a fatal problem, this is unexpected and will flag a warning on
      subsequent insert range operations on the inode. This problem has
      been reproduce intermittently by generic/270 running against a
      ramdisk device.
      
      Since right shift does not create new extent boundaries in the
      inode, a block reservation for extent merge is unnecessary. Update
      xfs_shift_file_space() to conditionally reserve fs blocks for left
      shift transactions only. This avoids the warning reproduced by
      generic/270.
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      48af96ab
    • A
      xfs: fix len comparison in xfs_extent_busy_trim · 353fe445
      Arnd Bergmann 提交于
      The length is now passed by reference, so the assertion has to be updated
      to match the other changes, as pointed out by this W=1 warning:
      
      fs/xfs/xfs_extent_busy.c: In function 'xfs_extent_busy_trim':
      fs/xfs/xfs_extent_busy.c:356:13: error: ordered comparison of pointer with integer zero [-Werror=extra]
      
      Fixes: ebf55872 ("xfs: improve handling of busy extents in the low-level allocator")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      353fe445
    • D
      xfs: fix uninitialized variable in _reflink_convert_cow · 93aaead5
      Darrick J. Wong 提交于
      Fix an uninitialize variable.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      93aaead5
    • B
      xfs: split indlen reservations fairly when under reserved · 75d65361
      Brian Foster 提交于
      Certain workoads that punch holes into speculative preallocation can
      cause delalloc indirect reservation splits when the delalloc extent is
      split in two. If further splits occur, an already short-handed extent
      can be split into two in a manner that leaves zero indirect blocks for
      one of the two new extents. This occurs because the shortage is large
      enough that the xfs_bmap_split_indlen() algorithm completely drains the
      requested indlen of one of the extents before it honors the existing
      reservation.
      
      This ultimately results in a warning from xfs_bmap_del_extent(). This
      has been observed during file copies of large, sparse files using 'cp
      --sparse=always.'
      
      To avoid this problem, update xfs_bmap_split_indlen() to explicitly
      apply the reservation shortage fairly between both extents. This smooths
      out the overall indlen shortage and defers the situation where we end up
      with a delalloc extent with zero indlen reservation to extreme
      circumstances.
      Reported-by: NPatrick Dung <mpatdung@gmail.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      75d65361
    • B
      xfs: handle indlen shortage on delalloc extent merge · 0e339ef8
      Brian Foster 提交于
      When a delalloc extent is created, it can be merged with pre-existing,
      contiguous, delalloc extents. When this occurs,
      xfs_bmap_add_extent_hole_delay() merges the extents along with the
      associated indirect block reservations. The expectation here is that the
      combined worst case indlen reservation is always less than or equal to
      the indlen reservation for the individual extents.
      
      This is not always the case, however, as existing extents can less than
      the expected indlen reservation if the extent was previously split due
      to a hole punch. If a new extent merges with such an extent, the total
      indlen requirement may be larger than the sum of the indlen reservations
      held by both extents.
      
      xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen
      reservation is always available and assigns it to the merged extent
      without consideration for the indlen held by the pre-existing extent. As
      a result, the subsequent xfs_mod_fdblocks() call can attempt an
      unintentional allocation rather than a free (indicated by an ASSERT()
      failure). Further, if the allocation happens to fail in this context,
      the failure goes unhandled and creates a filesystem wide block
      accounting inconsistency.
      
      Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the
      indlen reservation assigned to the merged extent to the sum of the
      indlen reservations held by each of the individual extents.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0e339ef8