1. 16 4月, 2015 2 次提交
    • D
      xfs: direct IO EOF zeroing needs to drain AIO · 40c63fbc
      Dave Chinner 提交于
      When we are doing AIO DIO writes, the IOLOCK only provides an IO
      submission barrier. When we need to do EOF zeroing, we need to ensure
      that no other IO is in progress and all pending in-core EOF updates
      have been completed. This requires us to wait for all outstanding
      AIO DIO writes to the inode to complete and, if necessary, run their
      EOF updates.
      
      Once all the EOF updates are complete, we can then restart
      xfs_file_aio_write_checks() while holding the IOLOCK_EXCL, knowing
      that EOF is up to date and we have exclusive IO access to the file
      so we can run EOF block zeroing if we need to without interference.
      This gives EOF zeroing the same exclusivity against other IO as we
      provide truncate operations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      40c63fbc
    • D
      xfs: DIO write completion size updates race · b9d59846
      Dave Chinner 提交于
      xfs_end_io_direct_write() can race with other IO completions when
      updating the in-core inode size. The IO completion processing is not
      serialised for direct IO - they are done either under the
      IOLOCK_SHARED for non-AIO DIO, and without any IOLOCK held at all
      during AIO DIO completion. Hence the non-atomic test-and-set update
      of the in-core inode size is racy and can result in the in-core
      inode size going backwards if the race if hit just right.
      
      If the inode size goes backwards, this can trigger the EOF zeroing
      code to run incorrectly on the next IO, which then will zero data
      that has successfully been written to disk by a previous DIO.
      
      To fix this bug, we need to serialise the test/set updates of the
      in-core inode size. This first patch introduces locking around the
      relevant updates and checks in the DIO path. Because we now have an
      ioend in xfs_end_io_direct_write(), we know exactly then we are
      doing an IO that requires an in-core EOF update, and we know that
      they are not running in interrupt context. As such, we do not need to
      use irqsave() spinlock variants to protect against interrupts while
      the lock is held.
      
      Hence we can use an existing spinlock in the inode to do this
      serialisation and so not need to grow the struct xfs_inode just to
      work around this problem.
      
      This patch does not address the test/set EOF update in
      generic_file_write_direct() for various reasons - that will be done
      as a followup with separate explanation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b9d59846
  2. 16 2月, 2015 1 次提交
  3. 11 2月, 2015 1 次提交
  4. 02 2月, 2015 1 次提交
  5. 21 1月, 2015 1 次提交
  6. 01 12月, 2014 1 次提交
  7. 28 11月, 2014 3 次提交
  8. 09 9月, 2014 2 次提交
  9. 02 9月, 2014 3 次提交
  10. 04 8月, 2014 1 次提交
    • D
      xfs: kill xfs_vnode.h · b92cc59f
      Dave Chinner 提交于
      Move the IO flag definitions to xfs_inode.h and kill the header file
      as it is now empty.
      
      Removing the xfs_vnode.h file showed up an implicit header include
      path:
      	xfs_linux.h -> xfs_vnode.h -> xfs_fs.h
      
      And so every xfs header file has been inplicitly been including
      xfs_fs.h where it is needed or not. Hence the removal of xfs_vnode.h
      causes all sorts of build issues because BBTOB() and friends are no
      longer automatically included in the build. This also gets fixed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b92cc59f
  11. 24 7月, 2014 1 次提交
    • B
      xfs: run an eofblocks scan on ENOSPC/EDQUOT · dc06f398
      Brian Foster 提交于
      From: Brian Foster <bfoster@redhat.com>
      
      Speculative preallocation and and the associated throttling metrics
      assume we're working with large files on large filesystems. Users have
      reported inefficiencies in these mechanisms when we happen to be dealing
      with large files on smaller filesystems. This can occur because while
      prealloc throttling is aggressive under low free space conditions, it is
      not active until we reach 5% free space or less.
      
      For example, a 40GB filesystem has enough space for several files large
      enough to have multi-GB preallocations at any given time. If those files
      are slow growing, they might reserve preallocation for long periods of
      time as well as avoid the background scanner due to frequent
      modification. If a new file is written under these conditions, said file
      has no access to this already reserved space and premature ENOSPC is
      imminent.
      
      To handle this scenario, modify the buffered write ENOSPC handling and
      retry sequence to invoke an eofblocks scan. In the smaller filesystem
      scenario, the eofblocks scan resets the usage of preallocation such that
      when the 5% free space threshold is met, throttling effectively takes
      over to provide fair and efficient preallocation until legitimate
      ENOSPC.
      
      The eofblocks scan is selective based on the nature of the failure. For
      example, an EDQUOT failure in a particular quota will use a filtered
      scan for that quota. Because we don't know which quota might have caused
      an allocation failure at any given time, we include each applicable
      quota determined to be under low free space conditions in the scan.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dc06f398
  12. 25 6月, 2014 1 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
  13. 22 6月, 2014 1 次提交
  14. 12 6月, 2014 1 次提交
    • A
      ->splice_write() via ->write_iter() · 8d020765
      Al Viro 提交于
      iter_file_splice_write() - a ->splice_write() instance that gathers the
      pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
      it to ->write_iter().  A bunch of simple cases coverted to that...
      
      [AV: fixed the braino spotted by Cyrill]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d020765
  15. 15 5月, 2014 3 次提交
  16. 07 5月, 2014 7 次提交
  17. 17 4月, 2014 1 次提交
  18. 14 4月, 2014 1 次提交
  19. 12 4月, 2014 1 次提交
    • L
      fs: move falloc collapse range check into the filesystem methods · 23fffa92
      Lukas Czerner 提交于
      Currently in do_fallocate in collapse range case we're checking
      whether offset + len is not bigger than i_size.  However there is
      nothing which would prevent i_size from changing so the check is
      pointless.  It should be done in the file system itself and the file
      system needs to make sure that i_size is not going to change.  The
      i_size check for the other fallocate modes are also done in the
      filesystems.
      
      As it is now we can easily crash the kernel by having two processes
      doing truncate and fallocate collapse range at the same time.  This
      can be reproduced on ext4 and it is theoretically possible on xfs even
      though I was not able to trigger it with this simple test.
      
      This commit removes the check from do_fallocate and adds it to the
      file system.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      23fffa92
  20. 08 4月, 2014 1 次提交
  21. 02 4月, 2014 3 次提交
  22. 13 3月, 2014 1 次提交
  23. 24 2月, 2014 1 次提交
  24. 10 2月, 2014 1 次提交
    • A
      fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d
      Al Viro 提交于
      It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
      when sync_page_range() had been introduced; generic_file_write{,v}() correctly
      synced
      	pos_after_write - written .. pos_after_write - 1
      but generic_file_aio_write() synced
      	pos_before_write .. pos_before_write + written - 1
      instead.  Which is not the same thing with O_APPEND, obviously.
      A couple of years later correct variant had been killed off when
      everything switched to use of generic_file_aio_write().
      
      All users of generic_file_aio_write() are affected, and the same bug
      has been copied into other instances of ->aio_write().
      
      The fix is trivial; the only subtle point is that generic_write_sync()
      ought to be inlined to avoid calculations useless for the majority of
      calls.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d311d79d