1. 03 10月, 2016 1 次提交
  2. 19 9月, 2016 2 次提交
  3. 17 8月, 2016 1 次提交
  4. 27 7月, 2016 1 次提交
    • R
      dax: remote unused fault wrappers · 6b524995
      Ross Zwisler 提交于
      Remove the unused wrappers dax_fault() and dax_pmd_fault().  After this
      removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
      dax_pmd_fault() respectively, and update all callers.
      
      The dax_fault() and dax_pmd_fault() wrappers were initially intended to
      capture some filesystem independent functionality around page faults
      (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
      and ctime).
      
      However, the following commits:
      
         5726b27b ("ext2: Add locking for DAX faults")
         ea3d7209 ("ext4: fix races between page faults and hole punching")
      
      added locking to the ext2 and ext4 filesystems after these common
      operations but before __dax_fault() and __dax_pmd_fault() were called.
      This means that these wrappers are no longer used, and are unlikely to
      be used in the future.
      
      XFS has had locking analogous to what was recently added to ext2 and
      ext4 since DAX support was initially introduced by:
      
         6b698ede ("xfs: add DAX file operations support")
      
      Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b524995
  5. 22 7月, 2016 1 次提交
    • A
      xfs: remove dax code from object file when disabled · f021bd07
      Arnd Bergmann 提交于
      We check IS_DAX(inode) before calling either xfs_file_dax_read or
      xfs_file_dax_write, and this will lead the call being optimized out at
      compile time when CONFIG_FS_DAX is disabled.
      
      However, the two functions are marked STATIC, so they become global
      symbols when CONFIG_XFS_DEBUG is set, leaving us with two unused global
      functions that call into an undefined function and a broken "allmodconfig"
      build:
      
      fs/built-in.o: In function `xfs_file_dax_read':
      fs/xfs/xfs_file.c:348: undefined reference to `dax_do_io'
      fs/built-in.o: In function `xfs_file_dax_write':
      fs/xfs/xfs_file.c:758: undefined reference to `dax_do_io'
      
      Marking the two functions 'static noinline' instead of 'STATIC' will let
      the compiler drop the symbols when there are no callers but avoid the
      implicit inlining.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: 16d4d435 ("xfs: split direct I/O and DAX path")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f021bd07
  6. 20 7月, 2016 6 次提交
  7. 21 6月, 2016 4 次提交
  8. 17 5月, 2016 1 次提交
  9. 03 5月, 2016 1 次提交
  10. 02 5月, 2016 4 次提交
  11. 06 4月, 2016 1 次提交
    • C
      xfs: better xfs_trans_alloc interface · 253f4911
      Christoph Hellwig 提交于
      Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
      that returns a transaction with all the required log and block reservations,
      and which allows passing transaction flags directly to avoid the cumbersome
      _xfs_trans_alloc interface.
      
      While we're at it we also get rid of the transaction type argument that has
      been superflous since we stopped supporting the non-CIL logging mode.  The
      guts of it will be removed in another patch.
      
      [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      253f4911
  12. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  13. 09 2月, 2016 1 次提交
  14. 08 2月, 2016 1 次提交
  15. 23 1月, 2016 2 次提交
    • R
      xfs: call dax_pfn_mkwrite() for DAX fsync/msync · 5eb88dca
      Ross Zwisler 提交于
      To properly support the new DAX fsync/msync infrastructure filesystems
      need to call dax_pfn_mkwrite() so that DAX can track when user pages are
      dirtied.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5eb88dca
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  16. 04 1月, 2016 1 次提交
    • D
      xfs: fix recursive splice read locking with DAX · a6d7636e
      Dave Chinner 提交于
      Doing a splice read (generic/249) generates a lockdep splat because
      we recursively lock the inode iolock in this path:
      
      SyS_sendfile64
      do_sendfile
      do_splice_direct
      splice_direct_to_actor
      do_splice_to
      xfs_file_splice_read			<<<<<< lock here
      default_file_splice_read
      vfs_readv
      do_readv_writev
      do_iter_readv_writev
      xfs_file_read_iter			<<<<<< then here
      
      The issue here is that for DAX inodes we need to avoid the page
      cache path and hence simply push it into the normal read path.
      Unfortunately, we can't tell down at xfs_file_read_iter() whether we
      are being called from the splice path and hence we cannot avoid the
      locking at this layer. Hence we simply have to drop the inode
      locking at the higher splice layer for DAX.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a6d7636e
  17. 11 11月, 2015 1 次提交
    • R
      vfs: remove unused wrapper block_page_mkwrite() · 5c500029
      Ross Zwisler 提交于
      The function currently called "__block_page_mkwrite()" used to be called
      "block_page_mkwrite()" until a wrapper for this function was added by:
      
      commit 24da4fab ("vfs: Create __block_page_mkwrite() helper passing
      	error values back")
      
      This wrapper, the current "block_page_mkwrite()", is currently unused.
      __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.
      
      Remove the unused wrapper, rename __block_page_mkwrite() back to
      block_page_mkwrite() and update the comment above block_page_mkwrite().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c500029
  18. 03 11月, 2015 5 次提交
    • D
      xfs: optimise away log forces on timestamp updates for fdatasync · fc0561ce
      Dave Chinner 提交于
      xfs: timestamp updates cause excessive fdatasync log traffic
      
      Sage Weil reported that a ceph test workload was writing to the
      log on every fdatasync during an overwrite workload. Event tracing
      showed that the only metadata modification being made was the
      timestamp updates during the write(2) syscall, but fdatasync(2)
      is supposed to ignore them. The key observation was that the
      transactions in the log all looked like this:
      
      INODE: #regs: 4   ino: 0x8b  flags: 0x45   dsize: 32
      
      And contained a flags field of 0x45 or 0x85, and had data and
      attribute forks following the inode core. This means that the
      timestamp updates were triggering dirty relogging of previously
      logged parts of the inode that hadn't yet been flushed back to
      disk.
      
      There are two parts to this problem. The first is that XFS relogs
      dirty regions in subsequent transactions, so it carries around the
      fields that have been dirtied since the last time the inode was
      written back to disk, not since the last time the inode was forced
      into the log.
      
      The second part is that on v5 filesystems, the inode change count
      update during inode dirtying also sets the XFS_ILOG_CORE flag, so
      on v5 filesystems this makes a timestamp update dirty the entire
      inode.
      
      As a result when fdatasync is run, it looks at the dirty fields in
      the inode, and sees more than just the timestamp flag, even though
      the only metadata change since the last fdatasync was just the
      timestamps. Hence we force the log on every subsequent fdatasync
      even though it is not needed.
      
      To fix this, add a new field to the inode log item that tracks
      changes since the last time fsync/fdatasync forced the log to flush
      the changes to the journal. This flag is updated when we dirty the
      inode, but we do it before updating the change count so it does not
      carry the "core dirty" flag from timestamp updates. The fields are
      zeroed when the inode is marked clean (due to writeback/freeing) or
      when an fsync/datasync forces the log. Hence if we only dirty the
      timestamps on the inode between fsync/fdatasync calls, the fdatasync
      will not trigger another log force.
      
      Over 100 runs of the test program:
      
      Ext4 baseline:
      	runtime: 1.63s +/- 0.24s
      	avg lat: 1.59ms +/- 0.24ms
      	iops: ~2000
      
      XFS, vanilla kernel:
              runtime: 2.45s +/- 0.18s
      	avg lat: 2.39ms +/- 0.18ms
      	log forces: ~400/s
      	iops: ~1000
      
      XFS, patched kernel:
              runtime: 1.49s +/- 0.26s
      	avg lat: 1.46ms +/- 0.25ms
      	log forces: ~30/s
      	iops: ~1500
      Reported-by: NSage Weil <sage@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fc0561ce
    • D
      xfs: xfs_filemap_pmd_fault treats read faults as write faults · 13ad4fe3
      Dave Chinner 提交于
      The code initially committed didn't have the same checks for write
      faults as the dax_pmd_fault code and hence treats all faults as
      write faults. We can get read faults through this path because they
      is no pmd_mkwrite path for write faults similar to the normal page
      fault path. Hence we need to ensure that we only do c/mtime updates
      on write faults, and freeze protection is unnecessary for read
      faults.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      13ad4fe3
    • D
      xfs: add ->pfn_mkwrite support for DAX · 3af49285
      Dave Chinner 提交于
      ->pfn_mkwrite support is needed so that when a page with allocated
      backing store takes a write fault we can check that the fault has
      not raced with a truncate and is pointing to a region beyond the
      current end of file.
      
      This also allows us to update the timestamp on the inode, too, which
      fixes a generic/080 failure.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3af49285
    • D
      xfs: DAX does not use IO completion callbacks · 01a155e6
      Dave Chinner 提交于
      For DAX, we are now doing block zeroing during allocation. This
      means we no longer need a special DAX fault IO completion callback
      to do unwritten extent conversion. Because mmap never extends the
      file size (it SEGVs the process) we don't need a callback to update
      the file size, either. Hence we can remove the completion callbacks
      from the __dax_fault and __dax_mkwrite calls.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      01a155e6
    • D
      xfs: fix inode size update overflow in xfs_map_direct() · 3e12dbbd
      Dave Chinner 提交于
      Both direct IO and DAX pass an offset and count into get_blocks that
      will overflow a s64 variable when an IO goes into the last supported
      block in a file (i.e. at offset 2^63 - 1FSB bytes). This can be seen
      from the tracing:
      
      xfs_get_blocks_alloc: [...] offset 0x7ffffffffffff000 count 4096
      xfs_gbmap_direct:     [...] offset 0x7ffffffffffff000 count 4096
      xfs_gbmap_direct_none:[...] offset 0x7ffffffffffff000 count 4096
      
      0x7ffffffffffff000 + 4096 = 0x8000000000000000, and hence that
      overflows the s64 offset and we fail to detect the need for a
      filesize update and an ioend is not allocated.
      
      This is *mostly* avoided for direct IO because such extending IOs
      occur with full block allocation, and so the "IS_UNWRITTEN()" check
      still evaluates as true and we get an ioend that way. However, doing
      single sector extending IOs to this last block will expose the fact
      that file size updates will not occur after the first allocating
      direct IO as the overflow will then be exposed.
      
      There is one further complexity: the DAX page fault path also
      exposes the same issue in block allocation. However, page faults
      cannot extend the file size, so in this case we want to allocate the
      block but do not want to allocate an ioend to enable file size
      update at IO completion. Hence we now need to distinguish between
      the direct IO patch allocation and dax fault path allocation to
      avoid leaking ioend structures.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3e12dbbd
  19. 12 10月, 2015 3 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2
    • B
      xfs: add an xfs_zero_eof() tracepoint · 0a50f162
      Brian Foster 提交于
      Add a tracepoint in xfs_zero_eof() to facilitate tracking and debugging
      EOF zeroing events. This has proven useful in the context of other
      direct I/O tracepoints to ensure EOF zeroing occurs within appropriate
      file ranges.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0a50f162
    • B
      xfs: always drain dio before extending aio write submission · 3136e8bb
      Brian Foster 提交于
      XFS supports and typically allows concurrent asynchronous direct I/O
      submission to a single file. One exception to the rule is that file
      extending dio writes that start beyond the current EOF (e.g.,
      potentially create a hole at EOF) require exclusive I/O access to the
      file. This is because such writes must zero any pre-existing blocks
      beyond EOF that are exposed by virtue of now residing within EOF as a
      result of the write about to be submitted.
      
      Before EOF zeroing can occur, the current file i_size must be stabilized
      to avoid data corruption. In this scenario, XFS upgrades the iolock to
      exclude any further I/O submission, waits on in-flight I/O to complete
      to ensure i_size is up to date (i_size is updated on dio write
      completion) and restarts the various checks against the state of the
      file. The problem is that this protection sequence is triggered only
      when the iolock is currently held shared. While this is true for async
      dio in most cases, the caller may upgrade the lock in advance based on
      arbitrary circumstances with respect to EOF zeroing. For example, the
      iolock is always acquired exclusively if the start offset is not block
      aligned. This means that even though the iolock is already held
      exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
      can occur based on an unstable i_size.
      
      This problem has been reproduced as guest data corruption in virtual
      machines with file-backed qcow2 virtual disks hosted on an XFS
      filesystem. The virtual disks must be configured with aio=native mode
      and the must not be truncated out to the maximum file size (as some virt
      managers will do).
      
      Update xfs_file_aio_write_checks() to unconditionally drain in-flight
      dio before EOF zeroing can occur. Rather than trigger the wait based on
      iolock state, use a new flag and upgrade the iolock when necessary. Note
      that this results in a full restart of the inode checks even when the
      iolock was already held exclusive when technically it is only required
      to recheck i_size. This should be a rare enough occurrence that it is
      preferable to keep the code simple rather than create an alternate
      restart jump target.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3136e8bb
  20. 09 9月, 2015 1 次提交
  21. 19 8月, 2015 1 次提交
    • B
      xfs: flush entire file on dio read/write to cached file · 3d751af2
      Brian Foster 提交于
      Filesystems are responsible to manage file coherency between the page
      cache and direct I/O. The generic dio code flushes dirty pages over the
      range of a dio to ensure that the dio read or a future buffered read
      returns the correct data. XFS has generally followed this pattern,
      though traditionally has flushed and invalidated the range from the
      start of the I/O all the way to the end of the file. This changed after
      the following commit:
      
      	7d4ea3ce xfs: use ranged writeback and invalidation for direct IO
      
      ... as the full file flush was no longer necessary to deal with the
      strange post-eof delalloc issues that were since fixed. Unfortunately,
      we have since received complaints about performance degradation due to
      the increased exclusive iolock cycles (which locks out parallel dio
      submission) that occur when a file has cached pages. This does not occur
      on filesystems that use the generic code as it also does not incorporate
      locking.
      
      The exclusive iolock is acquired any time the inode mapping has cached
      pages, regardless of whether they reside in the range of the I/O or not.
      If not, the flush/inval calls do no work and the lock was cycled for no
      reason.
      
      Under consideration of the cost of the exclusive iolock, update the dio
      read and write handlers to flush and invalidate the entire mapping when
      cached pages exist. In most cases, this increases the cost of the
      initial flush sequence but eliminates the need for further lock cycles
      and flushes so long as the workload does not actively mix direct and
      buffered I/O. This also more closely matches historical behavior and
      performance characteristics that users have come to expect.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3d751af2