1. 24 3月, 2018 2 次提交
  2. 16 3月, 2018 6 次提交
  3. 15 3月, 2018 6 次提交
  4. 12 3月, 2018 26 次提交
    • B
      xfs: account only rmapbt-used blocks against rmapbt perag res · 0ab32086
      Brian Foster 提交于
      The rmapbt perag metadata reservation reserves blocks for the
      reverse mapping btree (rmapbt). Since the rmapbt uses blocks from
      the agfl and perag accounting is updated as blocks are allocated
      from the allocation btrees, the reservation actually accounts blocks
      as they are allocated to (or freed from) the agfl rather than the
      rmapbt itself.
      
      While this works for blocks that are eventually used for the rmapbt,
      not all agfl blocks are destined for the rmapbt. Blocks that are
      allocated to the agfl (and thus "reserved" for the rmapbt) but then
      used by another structure leads to a growing inconsistency over time
      between the runtime tracking of rmapbt usage vs. actual rmapbt
      usage. Since the runtime tracking thinks all agfl blocks are rmapbt
      blocks, it essentially believes that less future reservation is
      required to satisfy the rmapbt than what is actually necessary.
      
      The inconsistency is rectified across mount cycles because the perag
      reservation is initialized based on the actual rmapbt usage at mount
      time. The problem, however, is that the excessive drain of the
      reservation at runtime opens a window to allocate blocks for other
      purposes that might be required for the rmapbt on a subsequent
      mount. This problem can be demonstrated by a simple test that runs
      an allocation workload to consume agfl blocks over time and then
      observe the difference in the agfl reservation requirement across an
      unmount/mount cycle:
      
        mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194
        ...
        ...      : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1
        umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0
        mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194
      
      As the above tracepoints show, the reservation requirement reduces
      from 3194 blocks to 2956 blocks as the workload runs.  Without any
      other changes in the filesystem, the same reservation requirement
      jumps from 2956 to 3052 blocks over a umount/mount cycle.
      
      To address this divergence, update the RMAPBT reservation to account
      blocks used for the rmapbt only rather than all blocks filled into
      the agfl. This patch makes several high-level changes toward that
      end:
      
      1.) Reintroduce an AGFL reservation type to serve as an accounting
          no-op for blocks allocated to (or freed from) the AGFL.
      2.) Invoke RMAPBT usage accounting from the actual rmapbt block
          allocation path rather than the AGFL allocation path.
      
      The first change is required because agfl blocks are considered free
      blocks throughout their lifetime. The perag reservation subsystem is
      invoked unconditionally by the allocation subsystem, so we need a
      way to tell the perag subsystem (via the allocation subsystem) to
      not make any accounting changes for blocks filled into the AGFL.
      
      The second change causes the in-core RMAPBT reservation usage
      accounting to remain consistent with the on-disk state at all times
      and eliminates the risk of leaving the rmapbt reservation
      underfilled.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0ab32086
    • B
      xfs: rename agfl perag res type to rmapbt · 21592863
      Brian Foster 提交于
      The AGFL perag reservation type accounts all allocations that feed
      into (or are released from) the allocation group free list (agfl).
      The purpose of the reservation is to support worst case conditions
      for the reverse mapping btree (rmapbt). As such, the agfl
      reservation usage accounting only considers rmapbt usage when the
      in-core counters are initialized at mount time.
      
      This implementation inconsistency leads to divergence of the in-core
      and on-disk usage accounting over time. In preparation to resolve
      this inconsistency and adjust the AGFL reservation into an rmapbt
      specific reservation, rename the AGFL reservation type and
      associated accounting fields to something more rmapbt-specific. Also
      fix up a couple tracepoints that incorrectly use the AGFL
      reservation type to pass the agfl state of the associated extent
      where the raw reservation type is expected.
      
      Note that this patch does not change perag reservation behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      21592863
    • B
      xfs: account format bouncing into rmapbt swapext tx reservation · b3fed434
      Brian Foster 提交于
      The extent swap mechanism requires a unique implementation for
      rmapbt enabled filesystems. Because the rmapbt tracks extent owner
      information, extent swap must individually unmap and remap each
      extent between the two inodes.
      
      The rmapbt extent swap transaction block reservation currently
      accounts for the worst case bmapbt block and rmapbt block
      consumption based on the extent count of each inode. There is a
      corner case that exists due to the extent swap implementation that
      is not covered by this reservation, however.
      
      If one of the associated inodes is just over the max extent count
      used for extent format inodes (i.e., the inode is in btree format by
      a single extent), the unmap/remap cycle of the extent swap can
      bounce the inode between extent and btree format multiple times,
      almost as many times as there are extents in the inode (if the
      opposing inode happens to have one less, for example). Each back and
      forth cycle involves a block free and allocation, which isn't a
      problem except for that the initial transaction reservation must
      account for the total number of block allocations performed by the
      chain of deferred operations. If not, a block reservation overrun
      occurs and the filesystem shuts down.
      
      Update the rmapbt extent swap block reservation to check for this
      situation and add some block reservation slop to ensure the entire
      operation succeeds. We'd never likely require reservation for both
      inodes as fsr wouldn't defrag the file in that case, but the
      additional reservation is constrained by the data fork size so be
      cautious and check for both.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      b3fed434
    • B
      xfs: shutdown if block allocation overruns tx reservation · 3e78b9a4
      Brian Foster 提交于
      The ->t_blk_res_used field tracks how many blocks have been used in
      the current transaction. This should never exceed the block
      reservation (->t_blk_res) for a particular transaction. We currently
      assert this condition in the transaction block accounting code, but
      otherwise take no additional action should this situation occur.
      
      The overrun generally has no effect if space ends up being available
      and the associated transaction commits. If the transaction is
      duplicated, however, the current block usage is used to determine
      the remaining block reservation to be transferred to the new
      transaction. If usage exceeds reservation, this calculation
      underflows and creates a transaction with an invalid and excessive
      reservation. When the second transaction commits, the release of
      unused blocks corrupts the in-core free space counters. With lazy
      superblock accounting enabled, this inconsistency eventually
      trickles to the on-disk superblock and corrupts the filesystem.
      
      Replace the transaction block usage accounting assert with an
      explicit overrun check. If the transaction overruns the reservation,
      shutdown the filesystem immediately to prevent corruption. Add a new
      assert to xfs_trans_dup() to catch any callers that might induce
      this invalid state in the future.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      3e78b9a4
    • M
      xfs: Rename xa_ elements to ail_ · 57e80956
      Matthew Wilcox 提交于
      This is a simple rename, except that xa_ail becomes ail_head.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      57e80956
    • D
      inode: don't memset the inode address space twice · ae23395d
      Dave Chinner 提交于
      Noticed when looking at why cycling 600k inodes/s through the inode
      cache was taking a total of 8% cpu in memset() during inode
      initialisation.  There is no need to zero the inode.i_data structure
      twice.
      
      This increases single threaded bulkstat throughput from ~200,000
      inodes/s to ~220,000 inodes/s, so we save a substantial amount of
      CPU time per inode init by doing this.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      ae23395d
    • D
      xfs: convert XFS_AGFL_SIZE to a helper function · a78ee256
      Dave Chinner 提交于
      The AGFL size calculation is about to get more complex, so lets turn
      the macro into a function first and remove the macro.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [darrick: forward port to newer kernel, simplify the helper]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      a78ee256
    • D
      xfs: check for cow blocks before trying to clear them · 6231848c
      Darrick J. Wong 提交于
      There's no point in allocating a transaction and locking the inode in
      preparation to clear cow blocks if there actually are any cow fork
      extents.  Therefore, move the xfs_reflink_cancel_cow_range hunk to
      xfs_inactive and check the cow ifp first.  This makes inode reclamation
      run faster.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      6231848c
    • D
      xfs: convert a few more directory asserts to corruption · 3f883f5b
      Darrick J. Wong 提交于
      Yet another round of playing whack-a-mole with directory code that
      asserts on corrupt on-disk metadata when it really should be returning
      -EFSCORRUPTED instead of ASSERTing.  Found by a xfs/391 crash while
      lastbit fuzzing of ltail.bestcount.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      3f883f5b
    • D
      xfs: don't iunlock the quota ip when quota block · 8241f7f9
      Darrick J. Wong 提交于
      In xfs_qm_dqalloc, we join the locked quota inode to the transaction we
      use to allocate blocks.  If the allocation or mapping fails, we're not
      allowed to unlock the inode because the transaction code is in charge of
      unlocking it for us.  Therefore, remove the iunlock call to avoid
      blowing asserts about unbalanced locking + mount hang.
      
      Found by corrupting the AGF and allocating space in the filesystem
      (quotacheck) immediately after mount.  The upcoming agfl wrapping fixup
      test will trigger this scenario.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      8241f7f9
    • V
      xfs: Correctly invert xfs_buftarg LRU isolation logic · 19957a18
      Vratislav Bendel 提交于
      Due to an inverted logic mistake in xfs_buftarg_isolate()
      the xfs_buffers with zero b_lru_ref will take another trip
      around LRU, while isolating buffers with non-zero b_lru_ref.
      
      Additionally those isolated buffers end up right back on the LRU
      once they are released, because b_lru_ref remains elevated.
      
      Fix that circuitous route by leaving them on the LRU
      as originally intended.
      Signed-off-by: NVratislav Bendel <vbendel@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      19957a18
    • D
      xfs: fix transaction allocation deadlock in IO path · 4df0f7f1
      Dave Chinner 提交于
      xfs_trans_alloc() does GFP_KERNEL allocation, and we can call it
      while holding pages locked for writeback in the ->writepages path.
      The memory allocation is allowed to wait on pages under writeback,
      and so can wait on pages that are tagged as writeback by the
      caller.
      
      This affects both pre-IO submission and post-IO submission paths.
      Hence xfs_setsize_trans_alloc(), xfs_reflink_end_cow(),
      xfs_iomap_write_unwritten() and xfs_reflink_cancel_cow_range().
      xfs_iomap_write_unwritten() already does the right thing, but the
      others don't. Fix them.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Fixes: 281627df ("xfs: log file size updates at I/O completion time")
      Fixes: 43caeb18 ("xfs: move mappings from cow fork to data fork after copy-write)"
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4df0f7f1
    • C
      xfs: implement the lazytime mount option · c3b1b131
      Christoph Hellwig 提交于
      Use the VFS dirty inode tracking for lazytime inodes only, and just
      log them in ->dirty_inode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c3b1b131
    • C
      fs: don't clear I_DIRTY_TIME before calling mark_inode_dirty_sync · 0d07e557
      Christoph Hellwig 提交于
      __mark_inode_dirty already takes care of that, and for the XFS lazytime
      implementation we need to know that ->dirty_inode was called because
      I_DIRTY_TIME was set.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0d07e557
    • N
      xfs: Remove dead code from inode recover function · bcab2ebf
      Nikolay Borisov 提交于
      The memcpy is guarded by a check which is performed a right before we
      call xfs_log_dinode_to_disk. At this point we are sure this check will
      always be false otherwise we would have errored out. So let's remove
      this dead weight.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      bcab2ebf
    • C
      Cleanup old XFS_BTREE_* traces · e157ebdc
      Carlos Maiolino 提交于
      Remove unused legacy btree traces from IRIX era.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e157ebdc
    • E
      xfs: remove unused m_dmevmask from xfs_mount struct · 4603fa74
      Eric Sandeen 提交于
      The dmevmask structure member is a dmapi leftover; it's
      set here and there but never actually used.  Remove it.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4603fa74
    • D
      xfs: fall back to vmalloc when allocation log vector buffers · cb0a8d23
      Dave Chinner 提交于
      When using large directory blocks, we regularly see memory
      allocations of >64k being made for the shadow log vector buffer.
      When we are under memory pressure, kmalloc() may not be able to find
      contiguous memory chunks large enough to satisfy these allocations
      easily, and if memory is fragmented we can potentially stall here.
      
      TO avoid this problem, switch the log vector buffer allocation to
      use kmem_alloc_large(). This will allow failed allocations to fall
      back to vmalloc and so remove the dependency on large contiguous
      regions of memory being available. This should prevent slowdowns
      and potential stalls when memory is low and/or fragmented.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      cb0a8d23
    • L
      Linux 4.16-rc5 · 0c8efd61
      Linus Torvalds 提交于
      0c8efd61
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ed58d66f
      Linus Torvalds 提交于
      Pull x86/pti updates from Thomas Gleixner:
       "Yet another pile of melted spectrum related updates:
      
         - Drop native vsyscall support finally as it causes more trouble than
           benefit.
      
         - Make microcode loading more robust. There were a few issues
           especially related to late loading which are now surfacing because
           late loading of the IB* microcodes addressing spectre issues has
           become more widely used.
      
         - Simplify and robustify the syscall handling in the entry code
      
         - Prevent kprobes on the entry trampoline code which lead to kernel
           crashes when the probe hits before CR3 is updated
      
         - Don't check microcode versions when running on hypervisors as they
           are considered as lying anyway.
      
         - Fix the 32bit objtool build and a coment typo"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/kprobes: Fix kernel crash when probing .entry_trampoline code
        x86/pti: Fix a comment typo
        x86/microcode: Synchronize late microcode loading
        x86/microcode: Request microcode on the BSP
        x86/microcode/intel: Look into the patch cache first
        x86/microcode: Do not upload microcode if CPUs are offline
        x86/microcode/intel: Writeback and invalidate caches before updating microcode
        x86/microcode/intel: Check microcode revision before updating sibling threads
        x86/microcode: Get rid of struct apply_microcode_ctx
        x86/spectre_v2: Don't check microcode versions when running under hypervisors
        x86/vsyscall/64: Drop "native" vsyscalls
        x86/entry/64/compat: Save one instruction in entry_INT80_compat()
        x86/entry: Do not special-case clone(2) in compat entry
        x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscalls
        x86/syscalls: Use proper syscall definition for sys_ioperm()
        x86/entry: Remove stale syscall prototype
        x86/syscalls/32: Simplify $entry == $compat entries
        objtool: Fix 32-bit build
      ed58d66f
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ad5daa6
      Linus Torvalds 提交于
      Pull timer fix from Thomas Gleixner:
       "Just a single fix which adds a missing Kconfig dependency to avoid
        unmet dependency warnings"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/atmel-st: Add 'depends on HAS_IOMEM' to fix unmet dependency
      1ad5daa6
    • L
      Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ebb3762e
      Linus Torvalds 提交于
      Pull RAS fixes from Thomas Gleixner:
       "Two small fixes for RAS/MCE:
      
         - Serialize sysfs changes to avoid concurrent modificaiton of
           underlying data
      
         - Add microcode revision to Machine Check records. This should have
           been there forever, but now with the broken microcode versions in
           the wild it has become important"
      
      * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/MCE: Serialize sysfs changes
        x86/MCE: Save microcode revision in machine check records
      ebb3762e
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8ad44243
      Linus Torvalds 提交于
      Pull perf updates from Thomas Gleixner:
       "Another set of perf updates:
      
         - Fix a Skylake Uncore event format declaration
      
         - Prevent perf pipe mode from crahsing which was caused by a missing
           buffer allocation
      
         - Make the perf top popup message which tells the user that it uses
           fallback mode on older kernels a debug message.
      
         - Make perf context rescheduling work correcctly
      
         - Robustify the jump error drawing in perf browser mode so it does
           not try to create references to NULL initialized offset entries
      
         - Make trigger_on() robust so it does not enable the trigger before
           everything is set up correctly to handle it
      
         - Make perf auxtrace respect the --no-itrace option so it does not
           try to queue AUX data for decoding.
      
         - Prevent having different number of field separators in CVS output
           lines when a counter is not supported.
      
         - Make the perf kallsyms man page usage behave like it does for all
           other perf commands.
      
         - Synchronize the kernel headers"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Fix ctx_event_type in ctx_resched()
        perf tools: Fix trigger class trigger_on()
        perf auxtrace: Prevent decoding when --no-itrace
        perf stat: Fix CVS output format for non-supported counters
        tools headers: Sync x86's cpufeatures.h
        tools headers: Sync copy of kvm UAPI headers
        perf record: Fix crash in pipe mode
        perf annotate browser: Be more robust when drawing jump arrows
        perf top: Fix annoying fallback message on older kernels
        perf kallsyms: Fix the usage on the man page
        perf/x86/intel/uncore: Fix Skylake UPI event format
      8ad44243
    • L
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 02bf0ef0
      Linus Torvalds 提交于
      Pull locking fix from Thomas Gleixner:
       "rt_mutex_futex_unlock() grew a new irq-off call site, but the function
        assumes that its always called from irq enabled context.
      
        Use (un)lock_irqsafe() to handle the new call site correctly"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites
      02bf0ef0
    • L
      Merge tag 'dmaengine-fix-4.16-rc5' of git://git.infradead.org/users/vkoul/slave-dma · abeb7521
      Linus Torvalds 提交于
      Pull dmaengine fixes from Vinod Koul:
       "Two small fixes are for this cycle:
      
         - fix max_chunk_size for rcar-dmac for R-Car Gen3
      
         - fix clock resource of mv_xor_v2"
      
      * tag 'dmaengine-fix-4.16-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
        dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
      abeb7521
    • L
      Merge tag 'gpio-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · d43be80a
      Linus Torvalds 提交于
      Pull GPIO fix from Linus Walleij:
       "This is a single GPIO fix for the v4.16 series affecting the Renesas
        driver, and fixes wakeup from external stuff"
      
      * tag 'gpio-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: rcar: Use wakeup_path i.s.o. explicit clock handling
      d43be80a