提交 · 3d751af2cbe9a73a869986a18e865f8a34265052 · openanolis / cloud-kernel

19 8月, 2015 1 次提交

xfs: flush entire file on dio read/write to cached file · 3d751af2

由 Brian Foster 提交于 8月 19, 2015

Filesystems are responsible to manage file coherency between the page
cache and direct I/O. The generic dio code flushes dirty pages over the
range of a dio to ensure that the dio read or a future buffered read
returns the correct data. XFS has generally followed this pattern,
though traditionally has flushed and invalidated the range from the
start of the I/O all the way to the end of the file. This changed after
the following commit:

	7d4ea3ce xfs: use ranged writeback and invalidation for direct IO

... as the full file flush was no longer necessary to deal with the
strange post-eof delalloc issues that were since fixed. Unfortunately,
we have since received complaints about performance degradation due to
the increased exclusive iolock cycles (which locks out parallel dio
submission) that occur when a file has cached pages. This does not occur
on filesystems that use the generic code as it also does not incorporate
locking.

The exclusive iolock is acquired any time the inode mapping has cached
pages, regardless of whether they reside in the range of the I/O or not.
If not, the flush/inval calls do no work and the lock was cycled for no
reason.

Under consideration of the cost of the exclusive iolock, update the dio
read and write handlers to flush and invalidate the entire mapping when
cached pages exist. In most cases, this increases the cost of the
initial flush sequence but eliminates the need for further lock cycles
and flushes so long as the workload does not actively mix direct and
buffered I/O. This also more closely matches historical behavior and
performance characteristics that users have come to expect.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

3d751af2

24 6月, 2015 2 次提交

xfs: Correctly lock inode when removing suid and file capabilities · a6de82ca

由 Jan Kara 提交于 5月 21, 2015

Currently XFS calls file_remove_privs() without holding i_mutex. This is
wrong because that function can end up messing with file permissions and
file capabilities stored in xattrs for which we need i_mutex held.

Fix the problem by grabbing iolock exclusively when we will need to
change anything in permissions / xattrs.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a6de82ca

fs: Rename file_remove_suid() to file_remove_privs() · 5fa8e0a1

由 Jan Kara 提交于 5月 21, 2015

file_remove_suid() is a misnomer since it removes also file capabilities
stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
something else than whether file_remove_suid() call is necessary which
leads to bugs.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5fa8e0a1

04 6月, 2015 5 次提交

xfs: saner xfs_trans_commit interface · 70393313

由 Christoph Hellwig 提交于 6月 04, 2015

The flags argument to xfs_trans_commit is not useful for most callers, as
a commit of a transaction without a permanent log reservation must pass
0 here, and all callers for a transaction with a permanent log reservation
except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove
the flags argument from the public xfs_trans_commit interfaces, and
introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
that regrants a log reservation instead of releasing it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

70393313

xfs: remove the flags argument to xfs_trans_cancel · 4906e215

由 Christoph Hellwig 提交于 6月 04, 2015

xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and
XFS_TRANS_ABORT.  Both of them are a direct product of the transaction
state, and can be deducted:

 - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled,
   and XFS_TRANS_ABORT is a noop for a transaction that is not dirty.
 - any transaction with a permanent log reservation needs
   XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing
   XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent
   log reservation is invalid.

So just remove the flags argument and do the right thing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

4906e215

xfs: add DAX block zeroing support · 4f69f578

由 Dave Chinner 提交于 6月 04, 2015

Add initial support for DAX block zeroing operations to XFS. DAX
cannot use buffered IO through the page cache for zeroing, nor do we
need to issue IO for uncached block zeroing. In both cases, we can
simply call out to the dax block zeroing function.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

4f69f578

xfs: add DAX file operations support · 6b698ede

由 Dave Chinner 提交于 6月 04, 2015

Add the initial support for DAX file operations to XFS. This
includes the necessary block allocation and mmap page fault hooks
for DAX to function.

Note that there are changes to the splice interfaces to ensure that
for DAX splice avoids direct page cache manipulations and instead
takes the DAX IO paths for read/write operations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

6b698ede

xfs: mmap lock needs to be inside freeze protection · ec56b1f1

由 Dave Chinner 提交于 6月 04, 2015

Lock ordering for the new mmap lock needs to be:

mmap_sem
  sb_start_pagefault
    i_mmap_lock
      page lock
        <fault processsing>

Right now xfs_vm_page_mkwrite gets this the wrong way around,
While technically it cannot deadlock due to the current freeze
ordering, it's still a landmine that might explode if we change
anything in future. Hence we need to nest the locks correctly.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

ec56b1f1

02 6月, 2015 1 次提交

writeback: separate out include/linux/backing-dev-defs.h · 66114cad

由 Tejun Heo 提交于 5月 22, 2015

With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.

This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it.  c files
which need access to more backing-dev details now include
backing-dev.h directly.  This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.

v2: fs/fat build failure fixed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@fb.com>

66114cad

29 5月, 2015 1 次提交

xfs: xfs_iozero can return positive errno · cddc1162

由 Dave Chinner 提交于 5月 29, 2015

It was missed when we converted everything in XFs to use negative error
numbers, so fix it now. Bug introduced in 3.17 by commit 2451337d ("xfs: global
error sign conversion"), and should go back to stable kernels.

Thanks to Brian Foster for noticing it.

cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

cddc1162

16 4月, 2015 3 次提交

xfs: using generic_file_direct_write() is unnecessary · 0cefb29e

由 Dave Chinner 提交于 4月 16, 2015

generic_file_direct_write() does all sorts of things to make DIO
work "sorta ok" with mixed buffered IO workloads. We already do
most of this work in xfs_file_aio_dio_write() because of the locking
requirements, so there's only a couple of things it does for us.

The first thing is that it does a page cache invalidation after the
->direct_IO callout. This can easily be added to the XFS code.

The second thing it does is that if data was written, it updates the
iov_iter structure to reflect the data written, and then does EOF
size updates if necessary. For XFS, these EOF size updates are now
not necessary, as we do them safely and race-free in IO completion
context. That leaves just the iov_iter update, and that's also moved
to the XFS code.

Therefore we don't need to call generic_file_direct_write() and in
doing so remove redundant buffered writeback and page cache
invalidation calls from the DIO submission path. We also remove a
racy EOF size update, and make the DIO submission code in XFS much
easier to follow. Wins all round, really.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

0cefb29e

xfs: direct IO EOF zeroing needs to drain AIO · 40c63fbc

由 Dave Chinner 提交于 4月 16, 2015

When we are doing AIO DIO writes, the IOLOCK only provides an IO
submission barrier. When we need to do EOF zeroing, we need to ensure
that no other IO is in progress and all pending in-core EOF updates
have been completed. This requires us to wait for all outstanding
AIO DIO writes to the inode to complete and, if necessary, run their
EOF updates.

Once all the EOF updates are complete, we can then restart
xfs_file_aio_write_checks() while holding the IOLOCK_EXCL, knowing
that EOF is up to date and we have exclusive IO access to the file
so we can run EOF block zeroing if we need to without interference.
This gives EOF zeroing the same exclusivity against other IO as we
provide truncate operations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

40c63fbc

xfs: DIO write completion size updates race · b9d59846

由 Dave Chinner 提交于 4月 16, 2015

xfs_end_io_direct_write() can race with other IO completions when
updating the in-core inode size. The IO completion processing is not
serialised for direct IO - they are done either under the
IOLOCK_SHARED for non-AIO DIO, and without any IOLOCK held at all
during AIO DIO completion. Hence the non-atomic test-and-set update
of the in-core inode size is racy and can result in the in-core
inode size going backwards if the race if hit just right.

If the inode size goes backwards, this can trigger the EOF zeroing
code to run incorrectly on the next IO, which then will zero data
that has successfully been written to disk by a previous DIO.

To fix this bug, we need to serialise the test/set updates of the
in-core inode size. This first patch introduces locking around the
relevant updates and checks in the DIO path. Because we now have an
ioend in xfs_end_io_direct_write(), we know exactly then we are
doing an IO that requires an in-core EOF update, and we know that
they are not running in interrupt context. As such, we do not need to
use irqsave() spinlock variants to protect against interrupts while
the lock is held.

Hence we can use an existing spinlock in the inode to do this
serialisation and so not need to grow the struct xfs_inode just to
work around this problem.

This patch does not address the test/set EOF update in
generic_file_write_direct() for various reasons - that will be done
as a followup with separate explanation.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

b9d59846

13 4月, 2015 1 次提交

xfs: unlock i_mutex in xfs_break_layouts · 21c3ea18

由 Christoph Hellwig 提交于 4月 13, 2015

We want to drop all I/O path locks when recalling layouts, and that includes
i_mutex for the write path.  Without this we get stuck processe when recalls
take too long.

[dchinner: fix build with !CONFIG_PNFS]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

21c3ea18

12 4月, 2015 5 次提交

A
mirror O_APPEND and O_DIRECT into iocb->ki_flags · 2ba48ce5
由 Al Viro 提交于 4月 09, 2015
```
... avoiding write_iter/fcntl races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2ba48ce5

switch generic_write_checks() to iocb and iter · 3309dd04

由 Al Viro 提交于 4月 09, 2015

... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success.  Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/ext4/file.c

3309dd04

A
xfs_file_aio_write_checks: switch to iocb/iov_iter · 99733fa3
由 Al Viro 提交于 4月 07, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
99733fa3

generic_write_checks(): drop isblk argument · 0fa6b005

由 Al Viro 提交于 4月 04, 2015

all remaining callers are passing 0; some just obscure that fact.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0fa6b005

make new_sync_{read,write}() static · 5d5d5689

由 Al Viro 提交于 4月 03, 2015

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d5d5689

26 3月, 2015 1 次提交

fs: move struct kiocb to fs.h · e2e40f2c

由 Christoph Hellwig 提交于 2月 22, 2015

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2e40f2c

25 3月, 2015 1 次提交

xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate · a904b1ca

由 Namjae Jeon 提交于 3月 25, 2015

This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.

1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
   block number is not the starting block of the extent, split the extent
   such that the block number is the starting block of the extent.
4) Shift all the extents which are lying bewteen [offset, last allocated extent]
   towards right by len bytes. This step will make a hole of len bytes
   at offset.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

a904b1ca

23 2月, 2015 4 次提交

xfs: ensure truncate forces zeroed blocks to disk · 5885ebda

由 Dave Chinner 提交于 2月 23, 2015

A new fsync vs power fail test in xfstests indicated that XFS can
have unreliable data consistency when doing extending truncates that
require block zeroing. The blocks beyond EOF get zeroed in memory,
but we never force those changes to disk before we run the
transaction that extends the file size and exposes those blocks to
userspace. This can result in the blocks not being correctly zeroed
after a crash.

Because in-memory behaviour is correct, tools like fsx don't pick up
any coherency problems - it's not until the filesystem is shutdown
or the system crashes after writing the truncate transaction to the
journal but before the zeroed data in the page cache is flushed that
the issue is exposed.

Fix this by also flushing the dirty data in memory region between
the old size and new size when we've found blocks that need zeroing
in the truncate process.
Reported-by: NLiu Bo <bo.li.liu@oracle.com>
cc: <stable@vger.kernel.org>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

5885ebda

xfs: take i_mmap_lock on extent manipulation operations · e8e9ad42

由 Dave Chinner 提交于 2月 23, 2015

Now we have the i_mmap_lock being held across the page fault IO
path, we now add extent manipulation operation exclusion by adding
the lock to the paths that directly modify extent maps. This
includes truncate, hole punching and other fallocate based
operations. The operations will now take both the i_iolock and the
i_mmaplock in exclusive mode, thereby ensuring that all IO and page
faults block without holding any page locks while the extent
manipulation is in progress.

This gives us the lock order during truncate of i_iolock ->
i_mmaplock -> page_lock -> i_lock, hence providing the same
lock order as the iolock provides the normal IO path without
involving the mmap_sem.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

e8e9ad42

xfs: use i_mmaplock on write faults · 075a924d

由 Dave Chinner 提交于 2月 23, 2015

Take the i_mmaplock over write page faults. These come through the
->page_mkwrite callout, so we need to wrap that calls with the
i_mmaplock.

This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
-> i_lock.

Also, move the page_mkwrite wrapper to the same region of xfs_file.c
as the read fault wrappers and add a tracepoint.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

075a924d

xfs: use i_mmaplock on read faults · de0e8c20

由 Dave Chinner 提交于 2月 23, 2015

Take the i_mmaplock over read page faults. These come through the
->fault callout, so we need to wrap the generic implementation
with the i_mmaplock. While there, add tracepoints for the read
fault as it passes through XFS.

This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
-> i_lock.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

de0e8c20

16 2月, 2015 1 次提交

xfs: recall pNFS layouts on conflicting access · 781355c6

由 Christoph Hellwig 提交于 2月 16, 2015

Recall all outstanding pNFS layouts and truncates, writes and similar extent
list modifying operations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

781355c6

11 2月, 2015 1 次提交

mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub · d83a08db

由 Kirill A. Shutemov 提交于 2月 10, 2015

Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d83a08db

02 2月, 2015 1 次提交

xfs: factor out a xfs_update_prealloc_flags() helper · 8add71ca

由 Christoph Hellwig 提交于 2月 02, 2015

This logic is duplicated in xfs_file_fallocate and xfs_ioc_space, and
we'll need another copy of it for pNFS block support.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8add71ca

21 1月, 2015 1 次提交

fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info · de1414a6

由 Christoph Hellwig 提交于 1月 14, 2015

Now that we got rid of the bdi abuse on character devices we can always use
sb->s_bdi to get at the backing_dev_info for a file, except for the block
device special case.  Export inode_to_bdi and replace uses of
mapping->backing_dev_info with it to prepare for the removal of
mapping->backing_dev_info.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

de1414a6

01 12月, 2014 1 次提交

xfs: fix simple_return.cocci warning in xfs_file_readdir · 8300475e

由 kbuild test robot 提交于 12月 01, 2014

fs/xfs/xfs_file.c:919:1-6: WARNING: end returns can be simpified and declaration on line 902 can be dropped

 Simplify a trivial if-return sequence.  Possibly combine with a
 preceding function call.
Generated by: scripts/coccinelle/misc/simple_return.cocci
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8300475e

28 11月, 2014 3 次提交

xfs: move most of xfs_sb.h to xfs_format.h · bb58e618

由 Christoph Hellwig 提交于 11月 28, 2014

More on-disk format consolidation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

bb58e618

xfs: merge xfs_ag.h into xfs_format.h · 4fb6e8ad

由 Christoph Hellwig 提交于 11月 28, 2014

More on-disk format consolidation.  A few declarations that weren't on-disk
format related move into better suitable spots.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

4fb6e8ad

xfs: merge xfs_dinode.h into xfs_format.h · 6d3ebaae

由 Christoph Hellwig 提交于 11月 28, 2014

More consolidatation for the on-disk format defintions.  Note that the
XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related
to the on disk format, but depends on a CONFIG_ option.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

6d3ebaae

09 9月, 2014 2 次提交

xfs: lseek: the "whence" argument is called "whence" · 59f9c004

由 Eric Sandeen 提交于 9月 09, 2014

For some reason, the older commit:

    965c8e59 lseek: the "whence" argument is called "whence"

    lseek: the "whence" argument is called "whence"

    But the kernel decided to call it "origin" instead.
    Fix most of the sites.

left out xfs.  So fix xfs.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

59f9c004

xfs: combine xfs_seek_hole & xfs_seek_data · 49c69591

由 Eric Sandeen 提交于 9月 09, 2014

xfs_seek_hole & xfs_seek_data are remarkably similar;
so much so that they can be combined, saving a fair
bit of semi-complex code duplication.

The following patch passes generic/285 and generic/286,
which specifically test seek behavior.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

49c69591

02 9月, 2014 3 次提交

xfs: use ranged writeback and invalidation for direct IO · 7d4ea3ce

由 Dave Chinner 提交于 9月 02, 2014

Now we are not doing silly things with dirtying buffers beyond EOF
and using invalidation correctly, we can finally reduce the ranges of
writeback and invalidation used by direct IO to match that of the IO
being issued.

Bring the writeback and invalidation ranges back to match the
generic direct IO code - this will greatly reduce the perturbation
of cached data when direct IO and buffered IO are mixed, but still
provide the same buffered vs direct IO coherency behaviour we
currently have.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

7d4ea3ce

xfs: don't zero partial page cache pages during O_DIRECT writes · 834ffca6

由 Dave Chinner 提交于 9月 02, 2014

Similar to direct IO reads, direct IO writes are using 
truncate_pagecache_range to invalidate the page cache. This is
incorrect due to the sub-block zeroing in the page cache that
truncate_pagecache_range() triggers.

This patch fixes things by using invalidate_inode_pages2_range
instead.  It preserves the page cache invalidation, but won't zero
any pages.

cc: stable@vger.kernel.org
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

834ffca6

xfs: don't zero partial page cache pages during O_DIRECT writes · 85e584da

由 Chris Mason 提交于 9月 02, 2014

xfs is using truncate_pagecache_range to invalidate the page cache
during DIO reads.  This is different from the other filesystems who
only invalidate pages during DIO writes.

truncate_pagecache_range is meant to be used when we are freeing the
underlying data structs from disk, so it will zero any partial
ranges in the page.  This means a DIO read can zero out part of the
page cache page, and it is possible the page will stay in cache.

buffered reads will find an up to date page with zeros instead of
the data actually on disk.

This patch fixes things by using invalidate_inode_pages2_range
instead.  It preserves the page cache invalidation, but won't zero
any pages.

[dchinner: catch error and warn if it fails. Comment.]

cc: stable@vger.kernel.org
Signed-off-by: NChris Mason <clm@fb.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

85e584da

04 8月, 2014 1 次提交

xfs: kill xfs_vnode.h · b92cc59f

由 Dave Chinner 提交于 8月 04, 2014

Move the IO flag definitions to xfs_inode.h and kill the header file
as it is now empty.

Removing the xfs_vnode.h file showed up an implicit header include
path:
	xfs_linux.h -> xfs_vnode.h -> xfs_fs.h

And so every xfs header file has been inplicitly been including
xfs_fs.h where it is needed or not. Hence the removal of xfs_vnode.h
causes all sorts of build issues because BBTOB() and friends are no
longer automatically included in the build. This also gets fixed.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

b92cc59f

24 7月, 2014 1 次提交

xfs: run an eofblocks scan on ENOSPC/EDQUOT · dc06f398

由 Brian Foster 提交于 7月 24, 2014

From: Brian Foster <bfoster@redhat.com>

Speculative preallocation and and the associated throttling metrics
assume we're working with large files on large filesystems. Users have
reported inefficiencies in these mechanisms when we happen to be dealing
with large files on smaller filesystems. This can occur because while
prealloc throttling is aggressive under low free space conditions, it is
not active until we reach 5% free space or less.

For example, a 40GB filesystem has enough space for several files large
enough to have multi-GB preallocations at any given time. If those files
are slow growing, they might reserve preallocation for long periods of
time as well as avoid the background scanner due to frequent
modification. If a new file is written under these conditions, said file
has no access to this already reserved space and premature ENOSPC is
imminent.

To handle this scenario, modify the buffered write ENOSPC handling and
retry sequence to invoke an eofblocks scan. In the smaller filesystem
scenario, the eofblocks scan resets the usage of preallocation such that
when the 5% free space threshold is met, throttling effectively takes
over to provide fair and efficient preallocation until legitimate
ENOSPC.

The eofblocks scan is selective based on the nature of the failure. For
example, an EDQUOT failure in a particular quota will use a filtered
scan for that quota. Because we don't know which quota might have caused
an allocation failure at any given time, we include each applicable
quota determined to be under low free space conditions in the scan.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

dc06f398

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功