提交 · fe0f07d08ee35fb13d2cb048970072fe4f71ad14 · openeuler / Kernel

25 4月, 2015 2 次提交

direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0

由 Jens Axboe 提交于 4月 15, 2015

do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe0f07d0

fs/9p: fix readdir() · 8e3c5005

由 Johannes Berg 提交于 4月 22, 2015

Al Viro's IOV changes broke 9p readdir() because the new code
didn't abort the read when it returned nothing. The original
code checked if the combined error/length was <= 0 but in the
new code that accidentally got changed to just an error check.

Add back the return from the function when nothing is read.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: e1200fe6 ("9p: switch p9_client_read() to passing struct iov_iter *")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e3c5005

16 4月, 2015 24 次提交

VFS: assorted d_backing_inode() annotations · bb668734

由 David Howells 提交于 3月 17, 2015

Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bb668734

VFS: fs/inode.c helpers: d_inode() annotations · df2b1afd

由 David Howells 提交于 3月 17, 2015

these should be used on objects already in top layer
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df2b1afd

D
VFS: fs/cachefiles: d_backing_inode() annotations · 466b77bc
由 David Howells 提交于 3月 17, 2015
```
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
466b77bc

VFS: fs library helpers: d_inode() annotations · dea655c2

由 David Howells 提交于 3月 17, 2015

library helpers called by filesystem drivers on their own inodes
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dea655c2

D
VFS: assorted weird filesystems: d_inode() annotations · 75c3cfa8
由 David Howells 提交于 3月 17, 2015
```
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
75c3cfa8

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

VFS: security/: d_inode() annotations · ce0b16dd

由 David Howells 提交于 2月 19, 2015

... except where that code acts as a filesystem driver, rather than
working with dentries given to it.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce0b16dd

VFS: security/: d_backing_inode() annotations · c6f493d6

由 David Howells 提交于 3月 17, 2015

most of the ->d_inode uses there refer to the same inode IO would
go to, i.e. d_backing_inode()
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c6f493d6

VFS: net/: d_inode() annotations · c5ef6035

由 David Howells 提交于 3月 17, 2015

socket inodes and sunrpc filesystems - inodes owned by that code
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c5ef6035

VFS: net/unix: d_backing_inode() annotations · a25b376b

由 David Howells 提交于 3月 17, 2015

places where we are dealing with S_ISSOCK file creation/lookups.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a25b376b

VFS: kernel/: d_inode() annotations · 7682c918

由 David Howells 提交于 3月 17, 2015

relayfs and tracefs are dealing with inodes of their own;
those two act as filesystem drivers
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7682c918

VFS: audit: d_backing_inode() annotations · 3b362157

由 David Howells 提交于 3月 17, 2015

Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3b362157

VFS: Fix up some ->d_inode accesses in the chelsio driver · c1d81b1c

由 David Howells 提交于 3月 06, 2015

Fix up some ->d_inode accesses in the chelsio driver.

 (1) FILE_DATA() should just be replaced with file_inode().

 (2) set_debugfs_file_size() should be removed and debugfs_create_file_size()
     should be used to create the file.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c1d81b1c

VFS: Cachefiles should perform fs modifications on the top layer only · 5153bc81

由 David Howells 提交于 3月 06, 2015

Cachefiles should perform fs modifications (eg. vfs_unlink()) on the top layer
only and should not attempt to alter the lower layer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5153bc81

VFS: AF_UNIX sockets should call mknod on the top layer only · ee8ac4d6

由 David Howells 提交于 3月 06, 2015

AF_UNIX sockets should call mknod on the top layer only and should not attempt
to modify the lower layer in a layered filesystem such as overlayfs.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ee8ac4d6

block: loop: switch to VFS ITER_BVEC · aa4d8616

由 Christoph Hellwig 提交于 4月 07, 2015

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa4d8616

configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode · 6683de38

由 David Howells 提交于 3月 02, 2015

Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6683de38

VFS: Make pathwalk use d_is_reg() rather than S_ISREG() · 4bbcbd3b

由 David Howells 提交于 3月 17, 2015

Make pathwalk use d_is_reg() rather than S_ISREG() to determine whether to
honour O_TRUNC.  Since this occurs after complete_walk(), the dentry type
field cannot change and the inode pointer cannot change as we hold a ref on
the dentry, so this should be safe.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4bbcbd3b

VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR() · 7ceab50c

由 David Howells 提交于 3月 05, 2015

Fix up debugfs to use d_is_dir(dentry) in place of
S_ISDIR(dentry->d_inode->i_mode).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ceab50c

VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk · 698934df

由 David Howells 提交于 3月 17, 2015

Where we have:

    	if (!dentry->d_inode || d_is_negative(dentry)) {

type constructions in pathwalk we should be able to eliminate the check of
d_inode and rely solely on the result of d_is_negative() or d_is_positive().

What we do have to take care to do is to read d_inode after calling a
d_is_xxx() typecheck function to get the barriering right.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

698934df

NFS: Don't use d_inode as a variable name · 88e7fbd4

由 David Howells 提交于 3月 04, 2015

Don't use d_inode as a variable name as it now masks a function name.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

88e7fbd4

VFS: Impose ordering on accesses of d_inode and d_flags · 4bf46a27

由 David Howells 提交于 3月 05, 2015

Impose ordering on accesses of d_inode and d_flags to avoid the need to do
this:

	if (!dentry->d_inode || d_is_negative(dentry)) {

when this:

	if (d_is_negative(dentry)) {

should suffice.

This check is especially problematic if a dentry can have its type field set
to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in
unionmount).

What we really need to do is stick a write barrier between setting d_inode and
setting d_flags and a read barrier between reading d_flags and reading
d_inode.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4bf46a27

VFS: Add owner-filesystem positive/negative dentry checks · 525d27b2

由 David Howells 提交于 2月 11, 2015

Supply two functions to test whether a filesystem's own dentries are positive
or negative (d_really_is_positive() and d_really_is_negative()).

The problem is that the DCACHE_ENTRY_TYPE field of dentry->d_flags may be
overridden by the union part of a layered filesystem and isn't thus
necessarily indicative of the type of dentry.

Normally, this would involve a negative dentry (ie. ->d_inode == NULL) having
->d_layer.lower pointed to a lower layer dentry, DCACHE_PINNING_LOWER set and
the DCACHE_ENTRY_TYPE field set to something other than DCACHE_MISS_TYPE - but
it could also involve, say, a DCACHE_SPECIAL_TYPE being overridden to
DCACHE_WHITEOUT_TYPE if a 0,0 chardev is detected in the top layer.

However, inside a filesystem, when that fs is looking at its own dentries, it
probably wants to know if they are really negative or not - and doesn't care
about the fallthrough bits used by the union.

To this end, a filesystem should normally use d_really_is_positive/negative()
when looking at its own dentries rather than d_is_positive/negative() and
should use d_inode() to get at the inode.

Anyone looking at someone else's dentries (this includes pathwalk) should use
d_is_xxx() and d_backing_inode().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

525d27b2

A
nfs: generic_write_checks() shouldn't be done on swapout... · 65a4a1ca
由 Al Viro 提交于 4月 09, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
65a4a1ca

12 4月, 2015 14 次提交

ocfs2: use __generic_file_write_iter() · 7da839c4

由 Al Viro 提交于 4月 09, 2015

we can do that now - all we need is to clear IOCB_DIRECT from ->ki_flags in
"can't do dio" case.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7da839c4

A
mirror O_APPEND and O_DIRECT into iocb->ki_flags · 2ba48ce5
由 Al Viro 提交于 4月 09, 2015
```
... avoiding write_iter/fcntl races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2ba48ce5

switch generic_write_checks() to iocb and iter · 3309dd04

由 Al Viro 提交于 4月 09, 2015

... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success.  Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/ext4/file.c

3309dd04

ocfs2: move generic_write_checks() before the alignment checks · 90320251

由 Al Viro 提交于 4月 09, 2015

	Alignment checks for dio depend upon the range truncation done by
generic_write_checks().  They can be done as soon as we got ocfs2_rw_lock()
and that actually makes ocfs2_prepare_inode_for_write() simpler.

	The only thing to watch out for is restoring the original count
in "unlock and redo without dio" case.  Position doesn't need to be
restored, since we change it only in O_APPEND case and in that case it
will be reassigned anyway.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

90320251

A
ocfs2_file_write_iter: stop messing with ppos · 5dc3161c
由 Al Viro 提交于 4月 09, 2015
```
it's &iocb->ki_pos; no need to obfuscate.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5dc3161c
A

Merge branch 'for-linus' into for-next · dfea9345
由 Al Viro 提交于 4月 11, 2015

dfea9345

udf_file_write_iter: reorder and simplify · 165f1a6e

由 Al Viro 提交于 4月 07, 2015

it's easier to do generic_write_checks() first
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

165f1a6e

fuse: ->direct_IO() doesn't need generic_write_checks() · 6b775b18

由 Al Viro 提交于 4月 07, 2015

already done by caller.  We used to call __fuse_direct_write(), which
called generic_write_checks(); now the former got expanded, bringing
the latter to the surface.  It used to be called all along and calling
it from there had been wrong all along...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b775b18

A
ext4_file_write_iter: move generic_write_checks() up · e768d7ff
由 Al Viro 提交于 4月 07, 2015
```
simpler that way...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e768d7ff
A
xfs_file_aio_write_checks: switch to iocb/iov_iter · 99733fa3
由 Al Viro 提交于 4月 07, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
99733fa3

generic_write_checks(): drop isblk argument · 0fa6b005

由 Al Viro 提交于 4月 04, 2015

all remaining callers are passing 0; some just obscure that fact.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0fa6b005

A
blkdev_write_iter: expand generic_file_checks() call in there · 7ec7b94a
由 Al Viro 提交于 4月 07, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7ec7b94a
A
lift generic_write_checks() into callers of __generic_file_write_iter() · 5f380c7f
由 Al Viro 提交于 4月 07, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5f380c7f

__generic_file_write_iter: keep ->ki_pos and return value consistent · 0b8def9d

由 Al Viro 提交于 4月 07, 2015

A side effect worth noting: in O_APPEND case we set ->ki_pos early,
so if it turns out to be an error or a zero-length write, we'll
end up with ->ki_pos modified.  Safe, since all callers never
look at the ->ki_pos after the call of __generic_file_write_iter()
returning non-positive, all the way to caller of ->write_iter() and
those discard ->ki_pos when getting that.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0b8def9d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功